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DNA SEQUENCES CODING FOR A PROTEIN CONFERRING MALE STERILITY 

This invention relates to recombinant, isolated and other 
synthetic DNA useful in male-sterility systems for 
5 plants. In particular, the invention relates to a gene 

associated with male fertility, labelled Ms41-A, and a 
recessive mutant form thereof, labelled ms41-A, which 
confers male sterility. Male-sterile plants are useful 
for the production of hybrid plants by sexual 
10 hybridisation. 

Hybrid plants have the advantages of higher yield and 
better disease resistance than their parents, because of 
heterosis or hybrid vigour. Crop uniformity is another 
15 advantage of hybrid plants when the parents are 

extensively homozygous; this leads to improved crop 
management. Hybrid seed is therefore commercially 
important and sells at a premium price. 

20 Producing a hybrid plant entails ensuring that the female 

parent does not self -fertilise . There have been many 
prior proposals, mechanical, chemical and genetic, for 
preventing self-pollination. Among the genetic methods 
is the use of anther-specific genes or their promoters to 

25 disrupt the normal production of pollen grains. An 

anther-specific promoter, for example, can be used to 
drive a "male-sterility DNA" at the appropriate time and 
in the right place. Male sterility DNAs include those 
coding for lytic enzymes, including those that lyse 

30 proteins, nucleic acids and carbohydrates. Glucanases 

are enzymes which break down carbohydrates. 



WO-A- 9302197 describes recombinant or isolated DNA 
encoding a glucanase called callase. 
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Aarts et al , (Nature, 363:715-717 (1993)) have described 
a gene required for male fertility, isolated from 
Arabidopsis t which has been labelled Ms2. 

5 We have now identified and isolated from Arabidopsis 

another gene linked to male fertility. This gene has been 
labelled Ms41-A. Its mutant , recessive, form is labelled 
ms41-A and is capable of conferring male sterility. This 
gene would appear to offer advantages over Ms2 when used 
10 to produce male sterile plants. 

Thus, in a first aspect the present invention provides 
recombinant or isolated Nucleic acid which: 

15 a) encodes the Ms41-A protein from Arabidopsis; 

b) encodes a Ms41-A like protein; 

c) encodes the ms41-A protein from Arabidopsis; 

20 

d) encodes a ms41-A like protein; 

e) comprises a promoter sequence which regulates 
expression of the Ms41-A protein from Arabidopsis or 

25 a promoter sequence which regulates expression of a 

Ms41-A like protein; or 

f) hybridises under stringent conditions to Nucleic 
acid a) , b) , c) , d) or e) or would do so but for the 

30 degeneracy of the genetic code. 

In one embodiment of a) above, the Nucleic acid encodes 
a protein having an amino acid squence as shown in figure 
4 . Although figure 4 relates only to a protein of 
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Arabidopsis, those skilled in the arc will readily be 
able to identify equivalent proteins from other members 
of the family Brassicaceae or indeed similar proteins 
from ocher commercially important plant families, ie 
5 Ms41-A like proteins. 

In turn the equivalent genes may be identified by 
hybridisation studies, restriction fragment length 
polymorphism (RFLP) , degenerate PCR and other methods 

10 known in the art. Genes or other DNA sequences, whether 

natural, engineered or synthetic, encoding closely 
equivalent proteins may for example hybridise under 
stringent conditions (such as at approximately 35°C to 
65oC in a salt solution of approximately 0.9 molar) to 

15 the Arabidopsis gene, or fragments of it of, for example, 

10, 20, 50 or 100 nucleotides, A 15-20 nucleotide probe 
would be appropriate under many circumstances. 

In the context of the present invention, "Nucleic acid 
20 which encodes" includes all nucleic acid, eg DNA 

sequences which will, when expressed, give rise to the 
protein. Examples of such DNA sequences include, but are 
not limited to, ones which comprise non-coding regions, 
e.g introns, sequences which include leader sequences 
25 and/or signal sequences, or simply comprise a coding 

sequence for the protein. The skilled person will also 
appreciate that, due to codon degeneracy, there will, for 
example, be a number of DNA sequences capable of coding 
for the Ms41-A protein or a Ms41-A like protein. 



30 



In general, the Nucleic acid of the invention will 
comprise at least a direct coding sequence for the 
protein as well as a promoter and transcription 
termination sequence. The promoter can itself comprise 
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only those sequences, or elements, necessary for the 
correct initiation of transcription (which regions can be 
described as transcription initiation regions, for 
instance) , or, alternatively, it can include regions of 
5 sequence which are not directly involved in the 

initiation of transcription, i.e. a complete promoter can 
be employed. 

A preferred coding sequence described in this 
10 specification is from AraJbidopsis and can be isolated by 

methods known in the art, for example by (a) synthesising 
cDNA from mRNA isolated from Arabidopsis , (b) isolating 
this cDNA. This cDNA can, in turn, be used (c) as a probe 
to identify regions of -the plant genome of a chosen 
15 member of another plant species, eg Maize, that encode 

mRNA of interest and (d) identifying the upstream {5') 
regulatory regions that contain the promoter of this DNA. 

A particularly preferred DNA sequence is that shown in 
20 figure 3, and more particularly, the sequence shown in 

figure 3 which commences with the base pair labelled 1, 
as will subsequently be described in the examples. Those 
skilled in the art will, with the information given in 
this specification, be able to identify with sufficient 
25 precision the coding regions and to isolate and/or 

recombine DNA containing them. 

The Nucleic acid of the invention can be used to confer 
male sterility on plants. For instance, the recessive 
30 form of the gene, ie ms41-A can be used to transform a 

plant. Alternatively, the dominant form, ie Ms41-A can be 
downregulated in some way. 

As discussed herein, the Nucleic acid can include a 
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promoter, and to increase the liklihood of male sterility 
being conferred it is possible to use promoters which 
drive expression in particular plant tissues which are 
involved" in the control of fertility. Examples of such 
promoters are those which are tapetum- specif ic , for 
example a nr»« a icaceae A3 or A9 promoter, described in 
WO- A- 9211379, and the AS promoter described in WO-A- 
9302197. Both WO-A-9211379 and WO-A-9302197 are hereby 
incorporated by reference. 

Because of the natural specificity of the regulation of 
expression of the Ms41-A or Ms41-A like gene, it is not 
necessary for the Ms41-A promoter to be linked to 
specific disrupter DNA to provide a useful male-sterility 
system (although it can be); non-specific disrupter DNA 
can be used. 

MS41-A like promoters from other plant species, eg from 
Maize, and modified Ms41-A promoters can be used, and if 
necessary located or identified and isolated as described 
above for the Ms41-A coding sequences, mutatis mutandis. 

MS41-A or MS41-A like promoter-containing DNA in 
accordance with the invention can, as indicated above, be 
used to confer male sterility on plants, particularly 
those belonging to the family Brassicaceae , in a variety 
of ways as will be discussed below. In an important 
embodiment of the invention, therefore, a promoter as 
described above is operatively linked to DNA which, when 
expressed, causes male sterility. 

Since an effective sterility system is complete, 
propaaation of the seed parent must proceed either by 
asexual means or via the pollination of the male- sterile 
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by an isogenic male-fertile line, and the subsequent 
identification or selection of male sterile plants among 
the offspring. Where vegetative propagation is 

practical, the present invention forms a complete system 

5 for hybrid production. Where fertility restoration is 

necessary to produce a seed crop, the present invention 
forms the basis of a new male sterility system. In some 
seed crops where the level of cross pollination is high, 
seed mixtures may enable restoration to be bypassed. The 

0 male sterility will be particularly useful in crops where 

restoration of fertility is not required, such as in the 
vegetable Brassica spp., and such other edible plants as 
lettuce, spinach, and onions. 

5 Nucleic acid in accordance with the invention and 

incorporating the Ms41-A or Ms41-A like promoter can 
drive male sterility DNA thereby producing male sterile 
plants, which can be used in hybrid production. 

10 A construct comprising a promoter operatively linked to 

a male sterility DNA can be transformed into plants 
(particularly those of the genus Brassica, but also other 
genera such as Nicotiana and Hordeuw) by methods which 
may be well known in themselves. This transformation 

25 results in the production of plants, the cells of which 

contain a foreign chimeric DNA sequence composed of the 
promoter and a male sterility DNA. Male-sterility DNA 
encodes an RNA, protein or polypeptide which, when 
produced or over-produced in a stamen cell of the plant, 

30 prevents the normal development of the stamen cell. 

The Ms4l-A or Ms41-A like promoter may be used to drive 
a variety of male sterility DNA sequences which code for 
RNAs , proteins or polypeptides which bring about the 
failure of mechanisms to produce viable male gametes. The 
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invention is not limited by the sequence driven, but a 
number of classes and particular examples of male 
sterility promoter-drivable sequences are preferred. 

5 For example, the drivable male sterility DNA may encode 

a lytic enzyme. The lytic enzyme may cause degradation 
of one or more biologically important molecules, such as 
macromolecules including nucleic acid, protein (or 
glycoprotein), carbohydrate and (in some circumstances) 
10 lipid. 

Ribonuclease (such as RNase Tl and barnase) are examples 
of enzymes which cause lysis of RNA. Examples of enzymes 
which lyse DNA include exonucleases and endonucleases , 
15 whether site-specific such as EcoRI or non-site-specific. 

Actinidin is an example of a protease, DNA coding for 
which can be suitable male sterility DNA. Other examples 
include papain zymogen and papain active protein. 

20 

Lipases whose corresponding nucleic acids may be useful 
as male sterility DNAs include phospholipase A.. 

Male sterility DNA does not have to encode a lytic 
25 enzyme. Other examples of male sterility DNA encode 

enzymes which catalyse the synthesis of phytohormones, 
such as isopentyl transferase, which is involved in 
cytokinin synthesis, and one or more of the enzymes 
involved in the synthesis of auxin. DNA coding for a 
3 0 lipoxygenase or other enzymes having a deleterious effect 

may also be used. 

As mentioned above, one way to confer male sterility will 
be to downregulate the Ms41-A or Ms41-A like gene. This 
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could be achieved by the use of antisense DNA. 
Introducing the coding region of a gene in the reverse 
orientation to that found in nature can result in the 
down- regulation of the gene and hence the production cf 
5 less or none of the gene product. The RNA transcribed 

from antisense DNA is capable of binding to, and 
destroying the function of, a sense RNA version of the 
sequence normally found in the ceil thereby disrupting 
function. 

10 

It is not crucial for antisense DNA solely to be 
transcribed at the time when the natural sense 
transcripcion product is being produced. Antisense RNA 
will in general only bincl with its sense complementary 

15 strand, and so will only have its toxic effect when the 

sense RNA is transcribed. Antisense DNA corresponding to 
some or ail of the DNA encoding the Ms41-A or Ms41-A like 
gene product may therefore be produced not only while the 
gene is being expressed. Such antisense DNA may be 

20 expressed constitut ively , under the control cf any 

appropriace promoter. 

It is also the case that one may wish to restore male 
fertility in later generations, this can also be achieved 
25 using antisense nucleic acid, eg nucleic acid which is 

antisense for a DNA molecule encoding ms41-A. 

Thus, in a second aspect, the presenc invention provides 
Antisense nucleic acid which includes a transcribable 
30 strand of DNA complementary to at least a part of a DNA 

molecule of the invention. 

In one embodiment of this aspect the antisense nucleic 
acid is under the control of a constitutive promoter, 



WO 97/23618 



PCT/GB96/03191 



9 



such as the CaMV35S promoter. 



A still further example of male sterility DNA encodes an 
RNA enzyme (known as a ribozyme) capable of highly 
5 specific cleavage against a given target sequence 

(Haseloff and Gerlach Nature 334 585-591 (1988)). Like 
antisense DNA, ribozyme DNA (coding in this instance for 
a ribozyme which is targeted against the RNA encoded by 
the Ms41-A or Ms41-A like gene) does not have to be 
10 expressed only at the time of expression of the Ms41-A or 

MS41-A like gene. Again, it may be possible to use any 
appropriate promoter to drive ribozyme-encoding DNA, 
including one which is adapted for constitutive 
expression. 



15 



20 



According to a further aspect of the invention, there is 
therefore provided DNA encoding a ribozyme capable of 
specific cleavage of RNA encoded by a DNA molecule of the 
invention. Such ribozyme-encoding DNA would be useful in 
conferring male sterility on members of, eg the family 
Brassi caceae . 



In addition, there are other useful methods which can be 
employed for the downregulation of the Ms41-A or Ms41-A 
25 like DNA sequences. Some examples of these are as 

follows ; 

i) expression of an antibody or antibodies, 
domains or fragments thereof against the Ms41-A or 

30 a Ms41-A like protein; 

ii) expression of mutant versions of the Ms41-A or 
of a MS41-A like protein which may interfere with 
the function of the normal protein; 
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iii) by creation of mutations in the Ms41-A sequence 
or the the Ms41-a like sequence with the result that 
mutant plants can be used in the recessive AMS 
system as hereinbefore described; and 

5 

iv) expression of mRNA binding proteins that will 
interfere specifically with Ms41-A or Ms41-A like 
transcription. 

10 In preferred embodiments of DNA sequences of this 

invention 3' transcription regulation signals, including 
a polyadenylation signal, may be provided. Preferred 3' 
transcription regulation signals are derived from the 
Cauliflower Mosaic Virus 35S gene. It should be 

15 recognised that other 3' transcription regulation signals 

could also be used. 

Recombinant DNA in accordance with the invention may be 
in the form of a vector. The vector may for example be 

20 a plasmid, cosmid or phage. Vectors will frequently 

include one or more selectable markers to enable 
selection of ceils transfected (or transformed: the terms 
are used interchangeably in this specification) with them 
and, preferably, to enable selection of cells harbouring 

25 vectors incorporating heterologous DNA. Appropriate 

start and stop signals will generally be present. 
Additionally, if the vector is intended for expression, 
sufficient regulatory sequences to drive expression will 
be present; however, DNA in accordance with the invention 

3 0 will generally be expressed in plant cells, and so 

microbial host expression would not be among the primary 
objectives of the invention, although it is not ruled 
out. Vectors not including regulatory sequences are 
useful as cloning vectors. 
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Cloning vectors can be introduced into E . coli or another 
suitable host which facilitate their manipulation. 
According to another aspect of the invention, there is 
therefore provided a host cell transfected or transformed 
with DNA as described above. 

DNA in accordance with the invention can be prepared by 
any convenient method involving coupling together 
successive nucleotides, and/or iigating oligo- and/or 
poly-nucieotides, including in vitro processes, but 
recombinant DNA technology forms the method of choice. 



Ultimately, DNA in accordance with the invention (whether 
(i) Ms41-A gene, ms41-A gene, Ms41-A like gene or ms41-A 

15 like gene (ii) antisense DNA to any option listed in i) , 

ribozyme DNA targeted to RNA for any option listed in i) 
or DNA comprising a promoter as described herein used to 
drive expression of a disrupter sequence, eg encoding 
Barnase) will be introduced into plant cells, by any 

20 suitable means. 

According to a further aspect of the invention, there is 
provided a plant cell including DNA in accordance with 
the invention as described above. 

25 

Preferably, DNA is transformed into plant cells using a 
disarmed Ti-plasmid vector and carried by Agrobac terlum 
by procedures known in the art, for example as described 
in EP-A-0116718 and EP-A-0270822 . Alternatively, the 
3 0 foreign DNA could be introduced directly into plant cells 

using an electrical discharge apparatus. This method is 
preferred where Agrobacterium is ineffective, for example 
where the recipient plant is monocotyledenous . Any other 
method that provides for the stable incorporation of the 
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DNA within the nuclear DNA of any plant cell of any 
species would also be suitable. This includes species of 
plant which are not currently capable of genetic 
transformation . 

5 

Preferably DNA in accordance with the invention also 
contains a second chimeric gene (a "marker" gene) that 
enables a transformed plant containing the foreign DNA to 
be easily distinguished from other plants that do not 

10 contain the foreign DNA . Examples of such a marker gene 

include antibiotic resistance ( Herrera-Estrella et al, 
EMBO J. 2, 987-995 (1983)), herbicide resistance {EP-A- 
0242246) and glucuronidase (GUS) expression (EP-A- 
0344029) . Expression of the marker gene is preferably 

15 controlled by a second promoter which allows expression 

in cells other than the tapetum, thus allowing selection 
of cells or tissue containing the marker at any stage of 
regeneration of the plant. The preferred second promoter 
is derived from the gene wh-ich encodes the 35S subunit of 

20 Cauliflower Mosaic Virus (CaMV) coat protein. However 

any other suitable second promoter could be used. 

A whole plant can be regenerated from a single 
transformed plant cell, and the invention therefore 

25 provides transgenic plants (or parts of them, such as 

propagating material) including DNA in accordance with 
the invention as described above. The regeneration can 
proceed by known methods. When the transformed plant 
flowers it can be seen to be male sterile by the 

30 inability to produce viable pollen. Where pollen is 

produced it can be confirmed to be non-viable by the 
inability to effect seed set on a recipient plant. 

Preferred features of each aspect of the invention are as 
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for each other aspect mutatis mutandis . 

The invention will now be illustrated by a number of non- 
iimiting examples, which refer to the accompanying 
5 drawings, in which: 

FIGURE 1: shows a Southern Blot of Hindlll-cut 
genomic DNA from 21 n?s41-A plants demonstrating 
linkage of the 35S-Ac element to jt?s41-A; 

FIGURE 2: shows a schematic diagram of the 
region containing the MS41-A locus cloned in 
lambda MSE3 . The position of insertion of the 
3 5S-AC is indicated; B, BamHI ; E, EcoRI ; H, 
Hindi II; S, Sael; 

FIGURE 3 : shows the genomic DNA sequence of the 
MS41-A gene. The sequence is numbered from the 
putative transcriptional start point of the 
20 /VTS41-A message. The predicted amino-acid 

sequence of MS41-A is shown together with the 
restriction sites ; 

FIGURE 4 : shows the predicted amino acid 
2 5 sequence of MS41-A; 

FIGURE 5: shows the oligonucleotides used to 
examine excision events of 3 5S-AC from the 
/ns41-A locus; 

30 

FIGURE 6: shows DNA sequences left by 35S-AC 
excision events at the ms41-A locus; 



10 



IS 



FIGURE 7: shows a diagram of the MS41-A 



WO 97/23618 



PCT/GB96/03191 



14 

promoter-GUS and WS41-A promoter -Barnase 
chimeric genes; 

FIGURE 8: shows a diagram of the MS41-A 
5 promoter-ancisense MS41-A and CaMV 35S 

promoter-ancisense and sense MS41-A chimeric 
genes ; 

FIGURE 9 : shows sequence alignments of proteins 
10 related to MS41-A; 

FIGURE 10: shows a partial DNA sequence and 
predicted amino acid translation of Zm41-A; 

15 FIGURE 11: shows a dendrogram of MS41-A related 

sequences ; 

FIGURE 12: shows the nucleotide sequence of the 
Z31 Zzn41-A gene. The- portion of the sequence 

20 corresponding to putative coding region is 

shown in bold type capital lettters . ♦ 
indicates putative first methionine deduced in 
frame with cDNA Zm41-A and 5 ' RACE products. * 
indicates the start of the longest 5 ' RACE 

25 product. ▼ indicates the start of Z/n41-A cDNA . 

12 exons are present and the translation is 
stopped in exon 11, the stop codon is TGA (□) . 
Non spliced DNA present in some RACE products 
is underlined; 

30 

FIGURE 13: shows restriction maps of Z31, Z33 
and Z3 5 genomic clones isolated with cDNA of 
Zin41-A. EI, HIII, NI and SI indicate restiction 
sites of endonucleases EcoRl , Hindlll , Ncol and 
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Sail, respectively. * indicates the start of 
the longest RACE product . ▼ indicates the start 
of Z/7I41-A cDNA. Dotted lines indicate 
homologous regions and a indicates deletions; 

5 

FIGURE 14: shows clustal V alignment between 
the protein deduced from the Z/n41-A cDNA and 
from the genomic longest open reading frame of 
Z31; 

10 

FIGURE 15: shows the nucleotide sequence of the 
Z3 3 Z/H41-A gene. The portion cf the sequence 
corresponding to DNA transcription is shown in 
bold type capital letters. Non spliced DNA 
15 present in some RACE products is underlined. 

This gene is truncated and only exons 3,5 and 
6 are present; and 

FIGURE 16: shows the nucleotide sequence of the 
Z35 Zm41-A gene. The portion of the sequence 
corresponding to DNA transcription is shown in 
bold type capital letters. Non spliced DNA 
present in some RACE products is underlined. 
This gene is truncated and only exons 3,4,5 and 
6 are present . 

Example 1 

Isolation of a gene required for m ale fertility — in 
3 0 Arabidoosis thaliana 

i) Isolation and phenotype of the ms41-A male sterile 
mutant . 



20 



25 
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The method used to identify a gene required for male 
fertility in Arabidopsis thaliana was transposon tagging. 
This method is a powerful technique for isolating genes 
which encode unknown products, allowing genes identified 
5 only by their mutant phenotype, to be cloned. 

Arabidopsis thaliana is a widely used model species that 
is an ideal plant for transposon tagging of genes, since 
it is a transformable diploid with a very small genome. 
Thus the chance of tagging desired genes is maximised. 
10 Additionally Arabidopsis is a Brassicaceae and is thus 

very closely related to important crop plants such as 
Brass ica napus (Oil Seed Rape) . 

Transposon tagging was achieved by transformation of C24 
15 Arabidopsis roots with modified autonomous Ac elements 

from Maize: D Ac and 35S Ac inserted into the leader of 
the GUS reporter gene in the reverse orientation 
{Constructs described in Finnegan et al . , Plant Molecular 
Biology, 22:625-633 (1993 U (As this work was in progress 
20 the first reports of gene tagging with similar Ac 

elements in heterologous plant species were published; a 
pH controlling gene from Petunia: Chuck et al . , Plant 
Cell, 5:371-378 (1993)); the Arabidopsis DRL1 locus: 
Bancroft et al . , Plant Cell, 5:631-638 (1993)) and the 
25 Arabidopsis Albino gene (Long et al . , Proceedings of the 

National Academy of Sciences U.S.A., 90:10370-10374 
(1993) ) . 



30 



Transformed plants were regenerated and the T2 progeny 
analysed for GUS activity and by molecular analysis. This 
demonstrated that the 35S Ac transposed quite 

efficiently (in 30% to 40% of progeny). The T3 progeny 
families derived from 279 selected Tl plants were then 
visually screened for mutants affected in male sterility. 
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A few fercility-reduced or sterile plants were recovered, 
some possessing additional abnormalities. A male sterile 
mutant (ms41-A) which appeared in family 41 had collapsed 
anthers with empty locules. Only one sterile plant was 
recovered from more than 2000 T3 siblings in this family. 
After cross -pollination with wild type pollen, elongation 
of siliques was observed, confirming that female 
fertility is unaffected by the mutation. 



From the above cross 21 Fl individuals were grown and 
allowed to self pollinate to produce F2 seed ; all the Fl 
plants were completely fertile suggesting that the 
mutation is recessive. The first analysis of 6 different 
15 F2 populations confirmed the recessive character of the 
mutation, as male sterility reappeared in a small 
proportion of each F2 population, with all other siblings 
presenting a wild type phenotype . Moreover, the 
vegetative development of the male sterile plants was 
20 identical to wild type C24 Arabidopsis. The observed 

frequency of male transmission of the mutation suggests 
a non-classical mendelian inheritance for a single 
recessive mutation - the frequencies of mutant plants in 
the F2 populations were: 16.8 ,- 13.0 ; 11.9 ; 12.7 ; 15.4 
25 and 17.0 %. The expected frequency of mutant plants is 25 

% or a 3 to 1 ratio of wild type to mutant plants. In 
this case there is a ratio of approximately 7 to 1 wild 
type to mutant plants. A homogeneity test on the data of 
the 6 F2 populations presented concludes that there is 
homogenous transmission of the male sterile phenotype 
(Chi square with 5 degrees of freedom = 8.69, 
0 . 10<P<0 . 20) . 



Proof of reduced transmission of Ms41-A through the male 
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gametophyte was obtained by genetic mapping of Ms41-A. 
The hypothesis was that markers genetically linked to 
Ms41-A but present on the homologous chromosome (in 
repulsion) on a Fl cross with an Ms41-A plant should be 
over- represented in the derived F2 population. The Fl 
crosses were made with 5 tester lines, one for each 
chromosome, constructed by Marteen Korneef (described in; 
O' Brian S.T. (ed) Genetic maps of complex genomes, Book 
6, Plant Cold Spring Harbor Laboratory Press, Cold Spring 
Harbor, New York, pp 94-97 (1990)), and linkage of Ms41-A 
was demonstrated with markers on the lower part of 
chromosome 1. Compiled recombination data of 2 
populations (476 and 540 individuals) were analysed by 
the Map Maker software version 2 (Lander et al . , 
Genetics, 121:174-181 (1987))). 

Ms41-A is between apetala 1 (8.1 cM) and glabra 2 (9.8 
cM) and 40.2 cM away from than chlorina 1, In the first 
F2 population, the deficit of Ms41-A plants was observed 
as before (14.7% of plants were male sterile) and it was 
correlated with the expected increase of apetala 1 and 
glabra 2 plants (29 % and 31.5 % respectively) ; the most 
distal marker, chlorina 1 behaves quite normally (22.3 
%) . In the second F2 , where the penetrance of the Ms41-A 
is less affected (18.3 %) , the over representation is not 
as prevalent (as expected) ; only the proportion of glabra 
2 plants appears to be slightly increased (27.2 %) . 

Microscopic observations of microsporogenesis in the male 
sterile Ms41-A plants revealed that the tetrads release 
abnormal microspores which degenerate rapidly. By aniline 
blue staining the tetrads appear abnormal with irregular 
shaped cells and with great variation in cell size. 
Moreover there is a mixed population of meiocytes, dyads 
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(a stage not usually observed in Arabidopsis) and tetrads 
in the same anther. The defect apparently lies just 
before or during meiosis . Cytological observations on 
fixed young anther buds reinforce this finding, since at 
5 meiosis the meiocytes are affected but the tapetum 

behaves normally. No differences were observed 
cytoiogically between the Ms41-A heterozygote and wild 
type plants. 

0 One other gene required for male-fertility (also in 

Arabidopsis) has been described previously (Aarts et 
al., Nature, 363:715-717 (1993)). Plants with a mutation 
in this gene (Ms2) were grown together with Ms41-A 
plants. In certain conditions , especially after the 
5 plants had been flowering for a long time the ms2 but not 

the Ms41-A plants reverted to male fertility. 

ii) Linkage of a transposed 3 5S Ac with the mutant 
phenotype 

To determine if the Ms41-A mutation was due to the 
insertion of a 35S-Ac element, Hindlll-cut DNA from five 
Ms41-A Fl individuals was analysed by Southern blotting 
using a 5 'Ac fragment (2.5 Kb EcoR I fragment from 
pBGS335RI (Finnegan et al . , Plant Molecular Biology, 22: 
625-633 (1993)) as a probe. Two identical Ac bands were 
present in the five mutant plants : 

- the internal Ac Hind III 1.6 kb band and 

- a junction 3' Ac band of approximately 2.8 kb, 
which differs from the expected non- transposed 35S Ac ( 
2.1 kb) . 

This indicates the presence of only one 3 5S Ac element 
which has transposed in the parental male sterile plant, 
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or more likely in its parents. To determine linkage 
between this 3 5S Ac element and the Ms41-A phenotype, 24 
Ms41-A plants from each of 6 different F2 populations 
were analysed by PCR for the presence of the Ac element 
5 using oligonucleotides : - 

5 ' h ( 5 ' AAGG AT C CTGG CAAAGACAT AAATC 3 ' ) and 

Acl2 <5' AGATGCTGCTACCCAATCTTTTGTGC 3 ' ) . 

The results were as follows : 

F2 41-A-A 23 positives out of 24 

10 F2 41-A-B 5 

F2 41-A-C 23 
F2 41-A-D 10 
F2 41-A-E 24 
F2 41-A-F 3 
15 

If the Ac element is linked to Ms41-A all male sterile 
plants should have the Ac element, however if the Ac is 
not linked only 3/4 of Ms41-A plants should have the Ac 
element. The results obtained indicate complete linkage 
20 only in the 41-A-E population. The lack of linkage in the 

other populations may be due to frequent imprecise 
excision of the Ac element from the Ms41-A locus leaving 
a mutation in Ms41-A. 

25 To confirm linkage, the most stable population, 41-A-E, 

was analysed by Southern blotting with a probe that 
contained both a region of the transposed Ac element and 
3' flanking plant DNA. To generate this probe DNA from a 
MS41-A plant was digested with Sspl, religated and 

30 amplified by PCR using Ac oligonucleotides : - 

Ac 11 (5' CGTAT CGG TTTTC G ATTAC CGT ATT 3') and 
Ac 12 (5' AGATGCTGCTACCCAATCTTTTGTGC 3'). 
The l.lkb inverse PCR (IPCR) fragment generated contained 
500 bp of Ac and the remainder consisted of 3' .flanking 
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Arabidopsis DNA. 

DNA from planes of the F2 population 41-A-E was digested 
with Hindlll and probed with the 3' IPCR fragment. 21 new 
5 F2 mutant individuals and 28 male fertile F2 plants were 

analysed, the selfed progenies of the latter were checked 
for the presence of mutant plants revealing that 15 of 
che 28 were heterozygous for Ms41-A. All of the 21 mutant 
plants (Figure 1) and those heterozygotes segregating the 

10 mutation in the F3 showed the same transposed 35S Ac 

revealed by the 2.8 kb specific band and the Ac internal 
1.6 kb band. A 3.3 kb band, corresponding to the wild 
type allele is detectable in most of the F2 mutants; this 
is probably due to to somatic excision of Ac and confirms 

15 that the transposed Ac element is still active. These 

results confirm that che 35S Ac is located in or in the 
vicinity of the Ms4l-A gene. 

iii) Genomic clones and cDNAs of the Ms41-A gene 

20 

Two different genomic libraries - one Mbol partial 
library in EMBL 3A { Clontech) and one Hindlll partial in 
Lambda Dash II { T. Pelissier, S.Tutois and G. Picard, 
unpublished) were screened with the 3' IPCR cloned 

25 product. Four different clones spanning the mutated 

region, were characterised by Southern analysis. One of 
them, lambda MSE3, which spans the transposon insertion 
site, was used for fine mapping. It contains the IPCR 
hybridising fragments detected on a genomic Southern 

30 (Hindlll 3.3 kb, Sspl 1.8 kb and Pstl 4 kb) . The entire 

plant DNA insert in MSE3 is contained on 4 Sail 
fragments; SI (Skb) . S2 ( 4 . 9kb) , S3 (4.3kb) and S4 
(2.3kb) (Figure 2) . The S3 fragment contains the plant 
DNA from the IPCR product. 
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After sequencing the IPCR product to determine the plant 
sequence 3' of the Ac element, more than 5000 bp of 
genomic sequence was obtained from MSE3 (3100 bp from the 
5' Ac flanking region and 1900 bp at the 3'). The 
5 genomic sequence is presented in figure 3 and is indexed 

according to the putative transcription initiation site 
determined by 5' RACE (see below) . One of the Sail sites 
of the fragment S3 is positioned at 2061 bp the other one 
is situated 5' upstream an EcoRI site (-1753 bp) and has 
10 not been sequenced. The transposon is inserted at 

position +318 bp. 

To identify mRNAs expressed in the region of the 
transposon insertion site, three Arabidopsis cDNA 

15 libraries were probed with either the SI or S3 fragments; 

a developing flower buds library (young buds) (Weigel et 
al., Cell, 69:843-859 (1992)), a library from flowers at 
late stages (after stage 10) (Hofte et al . , Plant 
Journal, 4:1051-1061 (199-3)) and an immature siliques 

20 library (Giraudat et al . , Plant Cell, 4:1251-1261 

(1992) ) . 

Two classes of cDNAs were recovered with the S3 fragment 
as a probe and characterised. 

25 - a 1.9 kb cDNA (Wll) , isolated from the developing 

flower buds library. Its 3' end is located 1.5 kb 
upstream of the 3' 35S Ac end, suggesting that it is not 
linked to the Ms41-A phenotype . Sequencing of the 
extremities revealed that the EcoRI site (-1753bp in 

30 figure 3) is present in the 3' part of this mRNA. 

- a 0 . 8 kb cDNA (G6) , isolated from the immature siliques 
library but also present in the developing flower buds 
library. Comparison of G6 and genomic sequences shows 
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that the transposon insertion site is 1440 bp upstream of 
the 5' end of the longest G6 cDNA (861 bp). In addition, 
che lack of a methionine codon in the 5' sequence of G6 
indicated that this cDNA was not full -length. Further 
5 attempts at obtaining longer cDNAs from the three 

libraries were unsuccessful. 

Another cDNA <A6) of approximately 1Kb was isolated using 
the SI fragment as a probe. It maps downstream of the G6 
10 message. 

Out of the 3 transcription units in the vicinity of the 
transposon insertion site, the best candidate for the 
Ms41-A mRNA was that corresponding to G6 . To obtain a 

15 full-length G6 cDNA, primers were designed to the 5' end 

of the longest G6 cDNA and used in a 5 ' RACE reaction (5' 
AmpliFinder kit, Clontech) . This proved unsuccessful, 
probably due to the 5' end of G6 lying far upstream of 
the longest cDNA obtained. Therefore primers were 

20 designed to regions of the genomic sequence that were 

upstream of the 5' end of the longest G6 cDNA, These, in 
combination with primers designed to the G6 cDNA, were 
used in RT-PCR reactions to define the extent of the G6 
transcribed region. Results obtained suggested that the 

25 G6 message was at least 1 kb longer than the longest G6 

cDNA obtained, and that the upstream sequence contained 
an intron of about 4 50 bp. 

The G6 transcriptional start site was finally mapped by 
3 0 5' RACE using primers Z3 

(5' TTATCATCAACATCGCCATCGAATCTGCCG 3', positions 494-464 

bp in Figure 3) ; 

and Wl (5' AAAGTAGTAAACCCTAGAG 3' , positions 279-260 bp) . 
RT-PCR was then used to recover a nearly full-length G6 
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30 



message. Comparison of the G6 and genomic sequences shows 
that the first ATG is situated at position 157 bp; thus 
G6 putatively encodes a protein of 584 amino acids 
(Figure 4) . Over the region of overlap the cDNA and 
genomic DNA sequences were identical . This deduced 
protein has no significant homology to proteins of known 
function on the GenebanJc, EMBL and NBRF databases. The 
coding sequence consists of three exons, the first of 
which has been disrupted by the insertion of the 35 Ac 
element at amino acid position 54 in the Ms41-A mutant. 
This is strong evidence that G6 corresponds to Ms41-A. 
Final confirmation was obtained by analysis of phenotypes 
and DNA sequences around the Ac insertion site in Ms41-A 
progeny plants in which the 35S Ac element has excised. 



To induce somatic exision of the 35S Ac element, plants 
were regenerated from liquid root cultures from single 
individuals derived from two different test-crosses. 
These crosses where between plants (A and B) that had 
only one Ac element but were still male sterile due to 
imprecise exision of the other Ac element, and male 
fertile plants that were heterozygous for Ms41-A: 35S Ac. 
This material was chosen because of the higher percentage 
of male sterile plants (40% instead of 20%, 50% instead 
25 of 25%?) than in a normal F2 population. Regenerants from 

clones representing male sterile plants were scored for 
male fertility. Numerous completely fertile plants were 
obtained from some individuals, however from 5 different 
regenerated plants from 4 different individuals, 7 
different "revertant siliques" were obtained. 



DNA from revertant plants or from progeny from "revertant 
siliques" was analysed by PCR for excision of the Ac 
element and PCR products cloned to determine the sequence 
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left by the Ac element (footprint) . The oligonucleotides 
presented in Figure 5 were used : Ac 11 with W2 for the 
presence of the 3' junction, Ac 14 with GS 5' -11 for the 
5' junction and W2 with G6 5' -11 or with Z3 for the 
5 excision allele (s). The PCR fragments derived from W2 

with G6 5' -11 or with Z3 were cloned in the pGEM-T vector 
(Promega) and sequenced for all revert ants. Previously 
junction products were sequenced confirming the presence 
of the typical target duplicated sequence of 8 base pairs 
10 : CTCCTCTC (positions 311 to 318 in Figure 3) . 

The genotypes of 7 revertant plants or sectors were 
determined and are presented in figure 5. For all of them 
an allele restoring the open reading frame is observed 

15 which is the same as the wild type in 4 cases , a 3 bp 

insertion in 2 cases and a 6 bp insertion in one case. 
Footprints destroying the coding phase are observed in 
different revertants and also in the female parents (2 
different 7 bp insertions and 2 different 5 bp insertion, 

20 and one with the addition of a 9 bp insertion which also 

introduces an in frame, TGA, stop codon) . Their presence 
is always associated with segregation of male sterile 
individuals in the progeny. These results demonstrate 
that the Ms41-A protein has a determinant role in male 

25 fertility and that the Ms41-A gene has been tagged with 

the 35S Ac element:. 

iv) Ms41-A genetic mapping 

3 0 Classical genetic mapping of Ms41-A with visual 

phenotypic markers has been described previously in 
section i) of this example. It places the Ms41-A locus 
near the bottom of chromosome 1. To determine if the 
Ms41-A mutation has been isolated previously in 
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Arabidopsis the mutation was mapped more precisely using 
recombinant inbred lines made by Caroline Dean (Lister et 
al., Plant Journal, 4:745-750 (1993)). This method 
requires the identification of restriction enzyme 
5 fragment length polymorphisms (RFLPs) between the two 

parental lines (Columbia and Landsburg erect a) which are 
in, or near the Ms41-A locus. Polymorphisms were not 
found in Ms41-A or 5' of it, however the downstream 
cDNA, 6A, gives a Hhal polymorphism. Results, processed 

10 by MapMaker version 2, have positioned Ms41-A near the 

marker m532 (1.3 cM) and marker gl7311 (4.6 cM) . Those 
RFLP markers are situated on chromosome 1 close to the 
ADH locus, and map in the vicinity of glabrous 2 and 
apetala 1 on the integrated Arabidopsis genetic map 

15 (Hauge et al . , Plant Journal, 3:745-754 (1993)). 

Ms4l-A is a new male-sterile mutant. It is not allelic to 
msl (Van der Veen and Wirtz, Euphytica, 17: 371-XXX 
(1968)) ms3, ms5, mslO, msil or ms!2 (Chaudhury 1993). 
20 It is also different to the Ms2 gene (Aarts et al . , 

supra) . 

v) Abundance of the Ms41-A message 

25 Ms41-A is expressed in 7 day old seedlings, in young 

floral buds and in immature siliques (cDNA libraries and 
RT-PCR data) . The mRNA could not be detected in these 
tissues by Northern blotting using poly A+ mRNA which had 
been used successfully in RT-PCR analysis for the Ms41-A 

30 message. Thus the Ms41-A message appears to be of very 

low abundance; approximately 10 fold lower than another 
message required for male ferility in Arabidopsis, Ms2, 
in the same cDNA library (1 out of 12 000 plaques for Ms2 
(Aarts et al . , supra) versus 1 out of 125 000 for 



WO 97/23618 



PCT/GB96/03191 



27 



MS41-A) . 
Example 2 

5 isolation nf thp Ms41-A promote r and — fusion — co — the 

R-Glucuronidase ( CIVS) reporter gene 

To attempt to determine Che extent of utility of the 
Ms4l-A promoter in male scerilty systems putative Ms41-A 

10 promoter fragments were linked to the reporter gene GUS 

and transformed into Arabidopsis and tobacco. This will 
reveal more precisely the spatial and temporal expression 
patterns of the Ms41-A gene and determine whether the 
low abundance of the Ms41-A transcript is due to weak 

15 expression or transcript instability. 

Two promoter fragments, -903 (Hind III) to +79 (Short 
promoter) and -1753 (EcoR I) to +79 (Long promoter), have 
been fused to the GUS gene (transcriptional fusions) to 
20 produce the binary vectors pBIOS 176 and pBIOS 177 

(Figure 7) . 

These plasmids were constructed as follows :- 

The primers Y7 (positions -1799 to -1782 in Figure 3) 

2 5 5' CCTAACTTTCTTTGCGGC 3' 

and W3 Xba (positions 84 to 59 in Figure 3) 
5' GATCTAGACCGTGATGTCTTAGAAGG 3' 

were used in a PCR to recover a 1883 bp Ms41-A promoter 
fragment. This was cloned into the vector pGEM-T 

3 0 (Promega) forming pSll. This plasmid was introduced into 

a dam, dcm minus E.coli strain (SCS 110) thus allowing 
the Xbal restriction enzyme to cleave the Xbal site. The 
985 bp Hindu I, Xbal fragment of pSll was cloned between 
the Hindi II and Xbal sites of pBI121 (replacing the 35S 
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CaMV promoter of this piasmid) forming piasmid pBIOS176. 
The 1853 bp EcoRI , Xbal fragment of p511 was cloned 
between the EcoRI and Xbal sites of pBI0S4 (a derivative 
of pBI121), replacing the 35S CaMV promoter of this 
5 piasmid, forming piasmid pBI0S177. 

To construct pBI0S4 , pBI121 was digested with EcoRI , the 
ends filled using Klenow polymerase and then reiigated 
forming pBIOSS. This piasmid was digested with Hindlll, 
0 the ends filled using Klenow and an EcoRI linker ligated 

into the destroyed Hindlll site, forming pBI0S4 . 

pBI0S176 and pBIOS177 were transformed into Arabidopsis 
and tobacco. The larger promoter fragment is predicted to 
.5 contain the entire Ms41-A promoter region since the EcoRI 

site lies with the 3' end of the Wll transcript. 



20 



25 



Arabidopsis results : - 

a) Short promoter:- Histochemical staining reveals that 
GUS activity is observed in most tissues and is 
especially high in callus, (strong blue staining is 
detectable after a few hours in X-GLUC 
(5-bromo-4-chloro-3-indolyl glucuronide) . 



b) Long promoter:- GUS activity was seen in callus, but 
no obvious blue staining was observed in the vegetative 
parts of primary transf ormants . However 75% of the 40 
transformants had significant GUS activity in anthers. In 
30 the floral buds observed, GUS expression is detected just 

after the breakdown of the callose wall (floral stage 
10); expression appears to be located initially in the 
tapetum and subsequently in the microspores. GUS activity 
is still present in mature pollen. However it is possible 
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that there is also GUS activity in the microsporocytes 
and tetrad microspores since the GUS substrate may not 
pentrate the thick cailose wails surrounding the 
microsporocytes and tetrads. 

5 

Similar staining experiments were done with plants 
containing the 3 tape turn- specific promoter fusions - TA2 9 
(Koltunow et al . , Plant Cell, 2:1201-1224 (1990)), A6 
(Hird et al . , Plant Journal, 4:1023-1033 (1993)) and A9 
10 (Paul etal., Plant Molecular Biology, 19:611-622 (1992)) 

and with the microspore/pollen promoter LAT 52 (Twell et 
al., Molecular and General Genetics, 217:240-245 (1989)). 

A9 is definitely the earliest and with the A6 promoter, 
15 GUS is expressed when tetrads are visible; by contrast 

the TA 29 promoter gives expression at roughly at the 
same time as Ms41-A; the latter also shows earlier 
expression in microspores than LAT 52. In seedlings of 5 
out of 7 transformed plants, very low levels of GUS 
20 expression is detected in aerial parts. 

Tobacco results 

a) Short promoter:- GUS expression appears to be 
25 constitutive. 

b) Long promoter:- Results were similar to those observed 
in Arabidopsis, ie expression is largly confined to the 
tapetum, microspores and pollen of the anther. Very low 

30 GUS expression was seen in the aerial parts of seedlings, 

however no expression was detected in callus. 

It appears that expression from the long promoter matches 
that of the Ms41-A gene, with very low level 
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"consitucive" expression. Expression in the anther is 
much stronger than predicted by the abundance of Ms41-A 
transcript in floral parts indicating that the Ms41-A 
message may be very unstable. Higher level constitutive 
5 expression observed from the short promoter suggests that 

there a constitutive silencer is present in the upstream 
region of the promoter between posititions -1635 to - 
900 bp. The conserved pattern of expression of the long 
promoter between tobacco and Arabidopsis suggests that 
10 the long promoter will be useful in male sterility 

systems in a wide range of plant species. Examples 3 and 
4 below demonstrate the use of the long Ms41-A promoter 
in male sterility systems. 

15 Example 3 

Expression of Barnase from the Ms41- A promoter in Tobacco 
and Maize 

20 The timing of expression of the Ms41-A promoter in the 

tapetum is similar to that seen from the tobacco TA29 
promoter, thus fusion to cytotoxins such as Dipthera 
toxin A (Thorsness et al . , Developmental Biology, 143: 
173-184 (1991)) and Barnase (Mariani et al., Nature, 347: 

25 737-741 (1990)) will ablate the anther tapetum leading to 

complete male sterility. Thus the long Ms41-A promoter is 
linked to Barnase. A lkb Xbal, HindlH (filled) fragment 
encoding Barnase is excised from pWP127 iPaul et al . , 
supra) and cloned between the Xbal and SstI (filled) 

30 sites of pBIOS177 forming pBIOS 177-Barnase (Figure 7) . 

This plasmid is used to regenerate tobacco and Maize 
transformants that are male sterile. Although the weak 
"consitutive" expression of the Ms41-A promoter should 
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prevent recovery of such plants, it is likely that these 
plants have reduced Ms41-A promoter expression. Thus no 
significant expression of Barnase occurs in vegetative 
tissues whereas expression is sufficient to cause tapetal 
5 cell death and male sterility. 

Example 4 

Expression of antisense Ms41-A from the Ms41-A promoter 
10 in Arabidovsis 

The Ms41-A promoter can be used to downregulate the 
expression of genes essential for tapetal function thus 
causing complete male sterility. Downregulation can be 

15 achieved by expression from the Ms41-A promoter of 

antisense or sense fragments of the target gene or by 
expression of ribozymes which will cleave the target gene 
transcript. Such a target gene is Ms41-A. To construct an 
Ms41-A promoter- Ms41-A antisense chimeric gene, RT-PCR 

20 is used to generate a 1923 bp Ms41-A fragment from young 

Arabidopsis floral buds mRNA. The primers used are : - 
W3 Bam, 5' CGGATCCTTCTAAGACATCACG 3' (positions 54-75, 
Figure 3 ) and 

3 <2, 5' AATGTACTACTACTACTACTTAGGAC 3' (positions 

25 3001-2976, Figure 3). 

This PGR fragment is cloned into pGEM-T forming p542, 
such that the 5' end of MS41-A is adjacent to the Apal 
site of pGEM-T (Figure 7). The MS41-A Spel, Apal (filled 
3 0 using T4 DNA polymerase) fragment is cloned between the 

Xbal and SstI (filled) sites of pBIOS177, thus replacing 
the GUS gene of pBIOS177 and forming pBIOS182 (Figure 8) . 
This plasmid is used to transform Arabidopsis. A 
proportion of transf ormants are male sterile with a 
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phenotype that resembled chat of the original Ms41-A 
mutant . Examples 5 and 7 below describe the use of the 
Ms41-A transcribed region in male sterility systems. 

5 Example 5 

Expression of a 3 5S CAMV promoter- Ms41-A antisense 
chimeric gene and a 3 5S CaMV promoter Ms41-A sense 
chimeric aene in Arabidopsis 

0 

As described in Example 4, downregulation of the Ms41-A 
gene by expression of Ms4l-A antisense fragments, sense 
fragments or ribozymes, each driven from the Ms41-A 
promoter will lead to male sterility. However any 

5 promoter that has the appropriate pattern of expression, 

ie is active in microsporocyte and/or tapetal cells of 
the anther at the time of Ms41-A expression, may be used 
to downregulate Ms41-A and cause male sterility. Thus a 
CaMV 35S promoter is linked to an antisense Ms41-A 

0 fragment and to a sense Ms41-A fragment. The antisense 

construct is obtained by cloning the Apal (filled) , Spel 
p542 MS41-A fragment between the Xbal and SstI (filled) 
sites of pBI0S4 forming pBIOS188 (Figure 8) . 

25 The sense construct is obtained by cloning the Apal 

(filled) , SstI p542 MS41-A fragment between the Smal and 
SstI sites of pBI0S4 forming pBIOS186 (Figure 8) . These 
plasmids are transformed into Arabidopisis . A proportion 
of the antisense and sense transf ormants are male sterile 

30 with a phenotype similar to that of the original Ms41-A 

mutant plant. 
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Example 6 

Isolation of a Ms41-A orth oloaue from Maize 

5 Most methods to use the coding region of the Ms41-A in a 

male sterilty system require the isolation of the 
orthologous sequence either from the crop species of 
interest or from a close evolutionary relative. Such 
methods include antisense and sense supression and the 

10 use of ribozymes. The degree of evolutionary conservation 

between orthologous protein sequences is variable and is 
probably dependant on constraints on protein function. 
Athough orthologous protein sequences may be highly 
conserved, codon usage may be quite different, producing 

15 orthologous rnRNA sequences that may have low homology. 

Thus, in order co downregulate the Maize version of 
Ms41-A, it is probably necessary to isolate the Maize 
version of Ms41-A. Given the Arabidiopsis Ms41-A rnRNA 
sequence, several approaches are possible for the 

20 isolation of the Maize orthologue. Some of which are 

outlined below: - 

The Ms41-A cDNA can be used as a probe on a Maize 
Northern or Southern at low stringency to see if a rnRNA 

25 or genomic band hybridises. This was unsucessful 

indicating that these sequences are widely diverged. The 
Arabidopsis sequence can be used as a probe in more 
closely related species and the orthologues in turn used 
as further probes until the version in Maize is 

30 identified. The cloning and sequencing of such 

orthologues may also result in the identification of 
conserved areas that can be used in a degenerate PCR 
approach. 
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Antibodies to Ms41-A may also be useful since protein 
sequences and epitopes are generally more conserved than 
RNA/DNA sequences. 

5 The approach used was to screen the Genebank and EST 

(Expressed Sequence Tag) databases for sequences that 
showed homology to the Arabidopsis Ms41-A DNA sequence. 
Four groups of sequences were identified according to the 
degree of sequence similarity. Alignments of these 
10 sequences are presented in Figure 9. 

Group 1 

This group contains the Arabidopsis Ms41-A cDNA and an 
EST sequence from rice OSS2204 (D40316) which was cloned 
15 from a shooc cDNA library (prepared from etiolated 8 day 

old seedlings) . 

Group 2 

In this group are two pairs of almost identical 
Arabidopsis EST sequences (ATTS3975 (Z37232) and T43470) 
and (T21748 and R30405) which are presumably derived from 
the same transcripts and can be considered as two 
sequences. The R30405, T21748 and T43470 cDNAs were 
isolated from a library prepared using a mixture of RNA 
from various tissues. The ATTS3975 cDNA is from a library 
prepared from cell suspension culture. In addition, in 
this group is a rice cDNA isolated from a root cDNA 
library (seedling stage) OSR1187 (D24087) . 

3 0 Group 3 

In this group are 3 EST sequences and 1 cDNA sequence 
ATTS1074 (isolated from a cycling cells cDNA library) . A 
partial EST sequence for ATTS 1074 is on the database 
(Z25611) and after identification of this sequence as 



20 



25 
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similar no Ms-41A the cDNA clone was obtained and the 
sequence completed. The other 3 sequences are all 
identical or almost identical to the ATTS1074 sequence. 

5 The cDNA clones R65265 and T44526 were isolated from a 

mixed RNA library. ATTS2424 is a 3' sequence EST sequence 
from the same cDNA clone as ATTS1074, this clone (TAI231) 
was isolated from a cDNA library prepared from a cell 
suspension culture containing cycling cells. 

10 

Group 4 

This group contains sequences of 4 closely related plant 
transcription factors; Viviparous-1 from maize (McCarty 
et al., Cell, 66:895-905 (1991)) and rice (Hattori et 
15 al., Plant Molecular Biology, 24:805-810 (1994)), ABI 3 

from Arabidopsis (Giraudat et al . , Plant Cell, 4: 
1251-1261 (1992)) and a Phaseolus vulgaris 
embryo-specific acidic transcriptional activator PvAlf 
(Bobb ec al . , Plant Journal In press (1995)) . 

20 

There is some amino-acid similarity between a region in 
the N-terminal of the Ms41-A protein and the proposed DNA 
binding domain of maize Viviparous-1. This region is 
highly conserved between the 4 transcription factors (>80 

25 % amino-acid identity between all 4 sequences) . This 

suggests that the Ms-41A protein may have DNA binding 
activity, although the MS41-A protein might be sorted via 
the ER, perhaps to be secreted, since Ms41-A has a 
putative signal peptide and 6 putative N glycosylation 

30 sites. 

The most closely related sequence to Ms41-A identified by 
this analysis is the rice OSS22 04 sequence. This was 
obtained from the rice sequencing project and used to 
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probe a Maize cDNA library made in Lambda UniZap 
(Stratagene) from polyA+ RNA isolated from pre-meiotic to 
meiotic-stage male inflorescences. The cDNA isolated, 
Zm41-A, is approximately 2.2 kb in length and has a poly 
5 A tail at it's 3' end. Approximately 300 bp of 5' prime 

sequence is shown in Figure 10. 

This sequence shows strong similarity to the rice OSS2204 
cDNA sequence (84 % identity) but is only 53% identical 
10 to the Arabidopsis sequence. The ORF indicated underneath 

the DNA sequence is similar to both the proposed OSS2204 
ORF (89 % identical, 94 % similar) and the Arabidopsis 
Ms41-A protein sequence (54 % identical, 65 % similar) . 

15 A dendrogram of the Ms41-A related sequences indicates 

that the Zm41-A sequence falls into group 1 (Figure 11) - 
This indicates that this cDNA is a good candidate for the 
maize orthologue of the Arabidopsis MS41A gene. 

20 Example 7 

Expression nf an ^rrin promoter- Zm41-A antisense 

chimeric oene in Maize 



25 



30 



The Zm41-A cDNA is linked in an antisense orientation to 
a rice act in promoter. The entire Zm41-A cDNA is excised 
from pBluescript SK- (Stratagene) as an Xhol (filled), 
PstI fragment and cloned into PstI, Smal - cut pCOR113 
(McElroy et al . , Molecular and General Genetics, 231: 
150-160) . This plasmid is used to transform Maize by a 
particle bombardment technique. A proportion of the 
transformants are male-sterile with a phenotype similar 
to that of the Arabidopsis Ms4l-A mutant. This suggests 
that the Zm41-A sequence is the functional orthologue of 
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Ms41-A and indicates that any sequence that falls within 
group 1 (Figure 11) is likely to encode a functional 
orthologue of Ms41-A. 

5 Example 8 

Molecular characterisation of Zm41-A aene(s) 
a) Zm41-A gene transcription 

BY RT-PCR this transcript has been shown to be abundant 
in anther RNA; in leaf and tassel RNA populations it is 
10 detected at a lower level. 

After comparison of the maize and Arabidopsis sequences 
it was thought that the cDNA was unlikely to be a full 
length clone. With the "Marathon cDNA amplification" kit 

15 (Clontech, Palo Alto, CA, USA) 5 ' RACE experiments were 

conducted on mRNA extracted from maize anthers at the 
meiosis stage, which yielded additional 5' sequence. Two 
types of 5 'RACE products were obtained and sequenced, the 
first contained approximately 150bp of additional 5' 

20 sequence as well as a 108bp insertion at position 244 in 

the cDNA . The second RACE product contained approximately 
130bp of additional 5' sequence. It is believed that the 
first RACE product may be the result of differential or 
incomplete splicing of the transcript resulting in a 36 

25 amino acid insertion in the predicted peptide sequence as 

well as the 52 additional amino acids at the N terminal 
of the protein. Even with these additional sequences the 
full length transcript is likely to be longer at the 5' 
end, based on comparison with the Arabidopsis protein and 

30 the maize genomic sequence. 

b) Isolation of and characterisation of maize genes which 
are orthologs to Ms41-A 

The Z/n41-A cDNA was used to screen two different maize 
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10 



genomic lambda libraries. The first was a commercial 
library (Clontech, Palo Alto, CA.USA) elaborated with DNA 
fragments from maize line B73 plantlets. DNA was 
partially digested with Mbol enzyme and the fragments 
were cloned into the SamHI site of EMBL-3 (Frischauf et 
al, J.Mol.Biol. , 170:827 (1983)). The insert DNA can be 
excised from the clone by the enzyme Sail. The second was 
a lambda library kindly provided by R. Mache (Universite 
Joseph Fourrier, URA 1178, Grenoble, France) elaborated 
with DNA fragments from the Mo 17 maize line. DNA was 
partially digested with the enzyme Mbol and the fragments 
were cloned into the BamHI site of EMBL-4 (Frischaul et 
al, supra). The insert DNA was excised by the enzyme 
EcoRI. The genomic libraries screening was performed 
15 following the instructions of Sambrook et al (Molecular 

cloning: a laboratory manual, Cold Spring Harbour 
Laboratory Press, New York, 1989). 10* recombinant Lambda 
per library were screened and three rounds of screening 
were performed. Fourteen- positive lambda clones were 
isolated one of which was obtained from the library 
provided by R. Mache. 



20 



DNA from positive lambda clones was extracted and 
purified using Qiagen columns (Chatsworth, CA, USA) 
25 according to the manufacturer's instructions. Then the 

clones were characterised by Southern analysis 
(J.Mol.Biol., 98:503-517 (1975)) in order to establish 
classes. DNAs from the Clontech library were restricted 
with Hindlll and EcoRI and double restricted with 
tfindlll/Sall. DNAs from the Mache library were restricted 
with Hindlll and EcoRI and double restricted with Hindlll 
/EcoRI. DNA fragments were separated on agarose gel, 
denatured and blotted onto Hybond N* membrane (Amersham, 
Buckinghamshire, UK) . The blots were hybridii -d with «P- 
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20 



labelled Zm41-A cDNA isolated after digestion with BamHI 
and Xhol (the resulting fragment is 2.1 kb long). 

Ten lambda clones were different and were distributed in 
three classes: 

class A comprising 5 clones (Z9, Z23 , Z27, Z35 and Z36) ; 
class B comprising 4 clones (Z7, Z28, Z29 and Z33); and 
class c with only one clone, Z31, isolated from the R. 
Mache library. 



In order to study the sequence of these three classes, 
the sub-cloning of three different genomic phages (Z31, 
Z33 and Z35) in the plasmid pBSII SK' (Stratagene, 
LaJolla, CA, USA) was performed according to the 
15 classical cloning method (Sambrook et ai, supra). 

Hybridizing fragments were firstly sleeted. After the 
sequencing of the fragments' extremities with universal 
primers, oligonucleotides were designed and the 
sequencing was chieved using the walking primer method. 



With the clone Z31, 7.8 kb of continuous sequence data 
were obtained (see figure 12) . To determine the complete 
gene structure, we have sequenced the entire Zm41-A cDNA. 
This is 2109 bp in length and encodes a putative peptide 

25 of 587 amino acids. The comparison between the genomic 

sequence and the cDNA and 5 ' RACE sequences indicated that 
this gene contains at least 12 exons . The insertion 
reported in the longest RACE products corresponds to the 
end of intron 4. Thus, the two families of cDNAs might be 

3 0 explained by the presence of two splicing sites in this 

intron. In the genomic sequence upstream of the end of 
the RACE products, there was dtected the continuation of 
the open reading frame of 270bp before an initiation 
codon at a Wcol restriction site. AAsuming that this 
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initiation site is the right one, the length of the 
fragment which might contain the promoter sequence was 
2.7Jcb from the Hindlll site where the sequence starts to 
the Wcol site. Therefore the translation of the Zm41-A 
Z31 gene should give a putative protein of 736 amino 
acids. The Z31 gene structure is depicted in figure 13. 

With the addition of the unspliced sequence (homologous 
to the end of intron 4) a longer protein might be 
obtained. Indeed, the longest open reading frame deduced 
from the genomic sequence Z31 including this insertion 
sequence exhibits two stop codons in frame. It is also 
worthwhile noting that there is a clear polymorphism here 
since the RACE products do not show these stop codons. 

5 The mis -splicing phenomenon may be a regulatory mechanism 

for the expression of the the Zm41-A related proteins as 
has recently been demonstrated in maize for another gene 
(Burr et al. The Plant Cell, 8:1249-1259 (1996)). 
Therefore, either this gene codes for two proteins (736 

0 aa and 131 aa) or it codes for the 736 aa and 772 aa 

proteins . 

Moreover, a slight difference was observed between the 
Zm41-A cDNA and the Z31 genomic sequence in exon ten 
5 where a small addition is present (15 bp replaced by 3 6 

bp) ; this is also in agreement with genetic polymorphism 
between maize lines. The maize lines used to study the 
mRNA and the genomic sequence are divergent (A188, B73 
and Mol7 respectively) . In figure 14 there is provided 
10 the alignment of the Z31 protein (736 aa) deduced from 

the longest open reading frame, with the protein deduced 
from the Zm41-A cDNA (587 aa) . We found 15 amino acid 
changes as well as an additional 7 amino acids for the 
Z31 protein, these additional amino acids being located 
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at position 556 of the Z/n41-A cDNA protein. 

For the other two genes, 233 and Z3 5, 2.9 Kb and 5.8 Kb 
were respectively sequenced (see figures 15 and 16) . 235 
5 contains exon 3 in part and the complete exons 4, 5 and 

6 from the Zm41-A cDNA. 233 is similar to Z3 5 but it has 
a deletion of exon 4 and the 3' end of exon 3. the two 
have the insertion sequence found in the longest 5' RACE 
products. In addition, the comparison of the Z33 and Z35 

10 sequences indicates at two deletions in the Z33 gene with 

respect to the Z3 5 gene. The first one is 686bp long and 
starts in the 3' end of exon 3 and extends to the end of 
exon 4 (with reference to the Z31 gene structure) . The 
latter is located upstream of the sequence homologous to 

15 Z31 and the Zm41-A cDNA and is 808bp long (see figure 

13). Moreover, these two genes differed in their 3' 
sequenced regions . 

Due to the high level of conservation between these 3 
20 sequences it is possible that the Z35 gene derived from 

Z31 via genetic rearrangements, deletions and/or 
insertions. Z33 has subsequent deletions from Z35. 



Example 9 

2 5 Genetic mapping of Zm41-A loci 

58 single seed descent (SSD) maize lines derived from the 
cross A188 x HD7 (Murigneux et al, Theor .Appl . Genet . , 
87:278-287 (1993)) were used for genetic mapping by RFLP 
technology. Hybridisation was performed with 

30 radiolabeled 2/n41-A cDNA (BamHI-XhoI fragment, 2.1 Kb) 

on blots containing DNA from SSD lines and parental 
lines, digested with Hindi I I or EcoRI . Linkage analysis 
with the other RFLP markers mapped on this population was 
done using the Mapmaker version 2.0 computer program for 
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Macintosh (Lander et al, Genomics, 1:174-181 (19B7)) and 
map distances were calculated with Kosambi function. 

Many polymorphic bands between parental lines were 
5 revealed: one or two major bands and a few faint bands. 

Three loci, named Zm41-A.A, Ziirtl-A.B and Z;n41-A.C were 
found located on two different chromosomes. Zm41-A.A 
locus corresponding to major bands, was located on the 
long arm of chromosome 6 at 26 cM from the RFLP marker 

10 umcl32 and at 2 cM from the rflp marker umc62 (Maize 

Genetics Cooperation Newsletters (MNL) (August 1995) 
69:248). Zm41-h.B and Zm41-A.C loci, corresponding to 
faint bands were located on chromosome 2 and were 
separated from each other by 19 cM. The Z/n41-A.B locus 

15 lies near the centomere between umcl31 (6 cM) and umc055 

(3 cM) markers (MNL, supra) . The Z/n41-A.C locus was on 
the longchromosomic arm between umc055 (16 cM) and umcQ22 
(6 cM) (MNL, supra) . According to the mutant maize 
genetic map, no obvious male sterile mutant is mapped in 

2 0 those regions. One dominant male sterile mutant, Ms21, 

discovered in 1950 has been assigned on chromosome 6 but 
not very precisely. This mutation gives sterility only in 
the presence of the sksl mutation. Interestingly, this 
mutation maps on chromosome 2, in the vicinity of the 

25 Zm41-A.B. Hybridisation on the blots containing DNA from 

SSD lines, with a Z31 gene specific probe, demonstrated 
that the Z31 gene corresponds to the Zm41-A.A locus on 
chromosome 6 . 
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CLAIMS : 

1. A recombinant or isolated nucleic acid sequence 
which; 



a) encodes the Ms41-A protein from Arabidopsis; 

b) encodes a Ms41-A like protein; 

10 c) encodes the ms41-A protein from Arabidopsis; 

d) encodes a ms41-A like protein; 

e) comprises a promoter sequence which regulates 
15 expression of the Ms41-A protein from Arabidopsis or 

a promoter sequence which regulates expression of a 
Ms41-A like protein; or 

f) hybridises under stringent conditions to Nucleic 
20 acid of a) , b) , c) , d) or e) or would do so but for 

the degeneracy of the genetic code. 



2. Nucleic acid as claimed in claim 1 a) wherein the 
DNA encodes a protein having an amino acid squence as 

25 shown in figure 4. 

3. Nucleic acid as claimed in claim 1 b) which includes 
the sequence shown in figures 12, 15 or 16. 

30 4. Nucleic acid as claimed in claim 1 derived from the 

family Brassicaceae or Maize. 

5. Nucleic acid as claimed in any one of claims 1 to 4 
which comprises a promoter, a coding region and a 
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transcription termination region. 

6. Nucleic acid as claimed in claim 5 having at least 
a part of the nucleotide sequence shown in figure 3 . 

5 

7. Nucleic acid as claimed in claim 6 having the 
nucleotide sequence shown in figure 3 commencing with the 
base pair labelled 1. 

10 8. Nucleic acid as claimed in claim la), b) , c) or d) 

which includes a promoter sequence which drives 
expression in a plant tissue involved in the control of 
fertility. 

15 9 . Nucleic acid as claimed in claim 3 wherein the 

promoter is a tapetum-specif ic promoter. 

10. Nucleic acid as claimed in claim 9 wherein the 
promoter is the A3, A6 ^r A9 promoter derived from 

20 Brassicaceae. 

11. Nucleic acid as claimed in claim 1 e) which is 
operatively coupled to a DNA sequence. 

25 12. Nucleic acid as claimed in claim 11 wherein the DNA 

sequence encodes a disrupter molecule. 

13. Nucleic acid as claimed in claim 12 wherein the 
disrupter molecule is a lytic enzyme, a ribonuclease , a 
30 protease or a lipase. 

14 Nucleic acid as claimed in claim 13 wherein the 
disrupter molecule is a ribonuclease. preferably Barnase. 
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15. Antisense nucleic acid which includes a 
transcribable strand of DNA complementary , to at least a 
part of a DNA molecule as defined in any one of claims 1 
to 10. 

5 

16. Antisense nucleic acid as claimed in claim 15 
wherein the antiense nucleic acid is under the control of 
a constitutive promoter. 

10 17> Antisense nucleic acid as claimed in claim 16 

wherein the constisutive promoter is the CaMV35S 
promoter . 

18. Nucleic acid encoding a ribozyme capable of 
15 specific cleavage of RNA encoded by a DNA molecule as 

defined in any one of claims 1 to 10 . 

19. Nucleic acid as claimed in claim 18 which also 
includes an appropriate promoter sequence, eg a 

20 constitutive promoter. 

20. Nucleic acid as claimed in any one of claims 1 to 

19 comprising a 3 transcript ion regulation signal. 

25 21. Nucleic acid as claimed in any one of claims 1 to 

20 which is in the form of a vector. 

22. A host cell transformed with nucleic acid as 

claimed in any one of claims 1 to 21. 



30 



23 . A process for preparing nucleic acid as claimed in 
any one of claims 1 to 22, the process comprising 
coupling together successive nucleotides, and/or ligating 
oligo- and/or poly-nucleotides . 
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24. A plant cell including nucleic acid as claimed in 
any one of claims 1 to 21. 

25. A whole plant, or part of a plant, comprising ceils 
5 as claimed in claim 24. 

26. A protein encoded by nucleic acid as defined in 
claim la), b) , c) or d) . 

0 27. A protein as claimed in claim 26 which has the 

amino acid sequence shown in figure 4. 

28. The use of nucleic acid as defined in any one of 
claims 1 to 21 in the preparation of a transgenic plant. 

.5 

29. A method for the production of a transgenic plant 
which comprises the step of transforming plant 
propagating material with nucleic acid as defined in any 
one of claims 1 to 21. 
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FIG. 3(1) 

TCCTAACTTTCTTTGCGGCATTTCTTATAATACTTCGTCAGTTTTCACiAXEC^*^^ 

-1780 -1760 

_1720 -1700 -1681 

TTTTGATGACTTTGGGAAATATTTATG 

-1660 -1640 

ATATTATATCGTAACCATGATGTGATAGTGGGCCTTAG^ 

G AAGC AG AGGC C CGTTTC AACGG AGC ATAAT AAATTGC ATTCTCTGTCTTTTGTTTTTAG 
-1540 -1520 -1501 

GTTTTTTTTTTAACTGATAGATGTGCCGTCGAAAATAATATTGATAT^ 

-1480 -1460 -1441 

ACAAACATTCTTAACTGACCCACCCATCTATCTGCTATTCCCACGCGCCAAGGAAAATAA 
-1420 -1400 -1381 

TAATAATAGCGAAATTGATTTTACATTTATTTA 

-1360 "1340 -1321 

ATTAACAGATTTTAAGGGATTAAAGTGGAAAAGGTAAACCGAAGACAACTTGCCATT^ 

-1300 "1280 "1261 

TG ATTT AC AAC AATC C AAATTT AAAAAC AAATGGTCCC AGTTTTT AGGGTTGTC ACTTAA 
.1240 "1220 "1201 

ATTTATCGAAATATTTACACTTTAATTGGGTAAAAC 

-1180 -1160 -1141 

Hindlll 

GTGACAAACAAAAAAACATGTTTTCACCAAGAAAAACAAAAACAAAAAAGATGTA^GCI 
_1120 -1100 -1081 

TTTCTTACATCTGTACAAAATAAAAGCAGACGAAATTGTACT 

-1060 -1040 -1021 

TGTCGGTATGTTTTATATGTTGTGAAAAGTAGAATGGATAACCAAATAAAAATTACTGCA 

-1000 "980 "961 

Hindlll 

TCTTAATAAAGTTGGTTCAACCGGTTTAAAATGTATTTTT^ 

-940 -920 -901 

CTTTTTTCGATTATCGAATTGCAACAAACAAATATATTAACAGAAAAAAGGAATCATGTA 
-880 -860 -841 

TCTATTTCAATATCCTGTTTTTTTTCTTCCATTTGGATATTTA 

-820 -800 -781 

ATCTTCTTCTTAAATTAAAACAGAAAAAAAGATTAAAAGTAAGACAGCTTGCTAATGGCA 
-760 -740 -721 

ACCGCAACAAACAAGATAATTTTGAAACGGATCCACTTGG 

.700 "680 -66. 
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FIG. 3(11) 



AAAATTG AC AAATTG C TTTTCT AT AAAAAC .^AAAAATGT ACCGT AAAAAC AC AC AC AT AA 
_ 6 40 -620 -601 

AAAA T AAAAAG TG AT AATG AC AAAC AAAT AAAG AGGT ATTTTTC TTTT ATCT AC T AATGT 
-530 "560 -541 

GATTATAAAAAAATCGACATTGAAAATTTCAACACATCTTTTTCGCCAAA^ 

-S20 -500 -481 

GGTCTTATTATAACATAAATTAGTTTTTTTGTCTTTCTATTATATATTCAATAACTCATC 
-460 -440 -421 

r CAACTTGAACAAACCTATAAGTTCCGTAGTGTTCTTTTCTGTTGTGACAAAAAATACTA 
-400 "380 -361 

-CTAACGAGGGATAAGCACAAAAACATGATTAATGTTTCTCTAATCATTCTAAAAATCTA 
-340 "320 -301 

CAGGAATATT^'CCTTTTCAGTTTTTTCTTTCTTAAATGCATTTCTTAGTTCTTCATAATT 
-280 "260 -241 

<"' AGTG AG TTTTAATAAC AATAAT AAAAAAAAG AGC ATC ATT AATTG AACCTAAAAAT AAT 
-220 -200 "181 

GGGAAGAAAAACCAAAAAGATAGAGAGTAAGATGCACGCGCTAAAGATCGAACGGTTAAT 
-160 "140 -121 

AGAATCAGGTTAGTGAAGAGAGATATTAAAAGTTTGTTGTCGTGTGGCAAAAACTATAAT 
-100 -80 -61 

TTCCTTCACACAAACAAAAAAAATAAAATCAAACACAAAATCCCGTAGCATCGTAACAGT 
-40 -20 "I 

AATTCGCTATTATCTCCTCACCCTCCGCTTTCGCTTCCCTTCTCTGCCCGTTTCAATTCC 
x 20 40 60 

TTCTAAGACATCACGGTCTCTCTCTATAAAAACAGTACCTACCTCTTCTTCTTCTTCTTC 
80 100 120 

MSPPSATA 
ATTCGCTGACTTCGTTTACACTGAAAACAAATACCTATGTCACCGCCGTCGGCAACCGCC 
140 160 180 

CDINHREVDPTIWRACAGAS 
GGTGACATCAACCACCGTGAAGTAGACCCGACGATCTGGCGCGCTTGTGCTGGAGCCTCC 
200 220 240 

o I P V L H S P. V YYFPOGHVEH 
GTCCAGATCCCTGTCCTTCACTCTAGGGTTTACTACTTTCCACAAGGTCACGTTGAGCAC 
260 280 300 

V 3SS-Ar 5' 



I I 

CCPLLSTLPSSTSPVPCIIT 
TGTTGCCCTCTCCTCTCl^CTCTTCCTTCCTCCACCTCGCCGGTTCCATGTATCATCACT 
320 340 360 
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FIG. 3(111) 



DEVFAHLILC 
^GTTGCTCGCCGATCCGGTTACCGACGAGGTCTTTGCTCACCTTATTCTTCAA 



SIQLLADPVT 

TCAATCC AG uu\.*w»-- - • rt 

380 400 420 



p lTQQ QFTPTNYSRFCRFDG 
CCGATCACGCAGCAGCAGTTTACTCCGACTAATTATTCACGATTCGGCAGATTCGATGGC 
440 460 480 

DVD-DNNKVTTFAKILTPSDA 
GATGTTGATGATAACAACAAGGTGACTACCTTCGCCAAAATTCTCACGCCTTCTGATGCT 
500 520 540 

NNGGGFSVPRFCADSVFPLL 
AACAATGGAGGTGGCTTCTCCGTTCCTCGTTTCTGTGCTGATTCCGTCTTCCCTCTGCTT 
560 580 600 

NFQIDPPVQKLYVTDIHGAV 
AATTTTCAAATCGATCCACCGGTTCAGAAGCTCTACGTCACTGATATCCATGGAGCTGTT 
620 640 660 

WDFRHIYRGTPRRHLLTT G W 
TGGGATTTCAGGCATATCTATCGCGGTACACCGAGGCGTCACTTGCTAACAACGGGATGG 
680 "700 "720 

SKFVNSKKLIAGDSVVFMRK 
AGTAAGTTTGTCAATAGCAAGAAGCTCATCGCTGGAGATTCGGTTGTGTTTATGAGAAAA 

740 760 780 

SADEMYIGVRRTPISSSDGG 
TCTGCAGATGAGATGTACATCGGTGTTAGGCGAACTCCGATCTCAAGCAGCGACGGAGGA 
800 320 S40 

S3YYGGDEYNGYYS0SSVAK 
AGTAGCTATTACGGAGGAGATGAGTATAACGGTTACTACAGTCAGAGTAGCGTTGCCAAG 
860 880 900 

EDDGSPKKTFRRSGNGKLTA 
GAAGATGATGGGAGTCCGAAGAAGACGTTTAGGAGATCTGGGAATGGTAAGTTGACTGCT 
920 940 960 

EAVRSINRASQGLPFEVV FY 
GAGGCTGTACGATCGATCAATAGAGCGTCTCAGGGATTACCGTTTGAGGTGGTGTTTTAT 
980 1000 1020 

PAAGWSEFVVRAEDVESSMS 
CCGGCTGCTGGATGGTCTGAGTTTGTTGTGAGAGCTGAAGATGTTGAGTCTTCAATGTCT 
1040 1060 1080 

MYWTPGTRVKMAMETEDSSR 
ATGTATTGGACTCCTGGGACTCGAGTCAAGATGGCTATGGAGACTGAAGATTCTTCTCGG 
1100 1120 H40 

XTWFQGIV35TYQETGPWRG 
ATCACATGGTTTCAAGGCATCGTTTCCTCTACTTATCAGGAGACCGGTCCATGGCGTGGA 
1160 1180 1200 
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FIG. 3 (IV) 



S P W K Q L < intron 1 

TCTCCATGGAAGCAGCTTCAGGTATATGATGTTTTTGAAATGGTCTTTGCTCTTCTTATC 

1220 1240 1260 



TCTGTGATGTTGAGTTAATGGAACAATTCAGAATCGATCTTGTATCTGTTGTGTGCAAGC 
1280 1300 1320 

CTTTAAGATGATGTTTAAGTCTCATCCTGGTTATTCAAATGTCAATTGG^ 

13 40 1360 1380 

TGTTTTGATTGCTGTGTTGTTTGTTTTGAAGCTAAATATTC 

U00 1420 1440 

CATACGAAAATGAATGTTCTGTCTCAGATTCATCTTCTATAAGATGGAATTGAAACTGGA 
1460 1480 1500 

AGATTTGGCTTAGTATTGTgTGTgTTGAGCGTCCGTGATGTAGAGTTGTTTTCATTATCC 
152 0 1540 1560 

TTCTTTGGCCACGCATTGTACATTGTGTTTGTTAAACTAGAGTTCCTCTGATTAGTCTTA 
1580 1600 1620 

TGAGATACTCCTTTTTTGCCAATATATTCTACTTCCTCTGATTAGTTCCTT^ 

1640 1660 1680 

>0 ITWDEPE I LQNVKRVN P 

CTTGCGTAGATCACATGGGATGAACCTGAGATTCTGCAAAACGTGAAGAGGGTGAATCCA 
1700 1720 1740 

WQVEIAAHATQLHTPFPPAK 
TGGCAAGTGGAAATTGCTGCACATGCAACTCAACTGCATACCCCTTTCCCTCCAGCAAAG 
1760 1780 1800 

RLKYPQPGGGFLSGDDGEI L 
AGGTTGAAGTATCCACAACCCGGAGGAGGGTTCTTGAGTGGAGATGATGGAGAAATCCTT 
1820 1340 I860 

Y PQSGLSSAAAPDPSPSMFS 
TATCCTCAAAGTGGACTGTCTAGTGCAGCAGCACCTGATCCAAGTCCTTCTATGTTCTCG 

1880 1900 1920 

Y STF PAGMQGARQY D F G S F N 
TATTCTAC ATTTC CTGCTGGC ATGC AGGG AGCC AGGC AATATG ATTTTGGGTCTTTC AAT 

1940 I960 1980 

PTGFIGGNPPQLFTNNFLS P 
CCAACCGGATTCATTGGAGGAAATCCTCCCCAGCTATTCACCAATAACTTCTTAAGTCCG 
2000 2020 2040 

LPDLGKVSTEMMNFGS PPSD 
CTTCCTGATTTGGGAAAAGT^C^TCAGATGATC 

2060 Sail 2080 2100 

NL3 pNSNTTNLSSGNDLVGN 
AACTTATCGCCTAATAGCAACACCACTAATCTGTCCTCTGGAAATGACCTGGTTGGAAAC 
2120 2140 2160 
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RGPLSKKVNSIQLFGKIITV 
r GAGGCCCCCTTTCAAAGAAAGTTAACTCGATTCAGTTGTTTGGCAAGATCATTACCGTG 
2180 2200 2220 

EEHSESGPAESGLCEEDGSK 
GAGGAGCATTCTGAGAGCGGTCCTGCAGAGTCTGGCTTGTGTGAAGAGGATGGCAGCAAA 
2240 2260 2280 

ESSDNETQLSLSHAP PSVPK 
GAGTCCAGCGACAATGAGACACAGTTGTCCTTATCACATGCTCCTCCAAGCGTGCCTAAA 
2300 2320 2340 

HSNSNAGSSS Q< intron 2 

CATTCCAACAGCAACGCAGGTTCTAGCTCCCAAGGTATATTCCGATCTCTCTCAAGTACA 
2360 2380 2400 

Hindlll 

ATAATCAATTGAATCAGTTGCTATAAQmrrTATTACTGTTTTGCAC^ 

2420 2440 2460 

TTCC TTTCC C ATG AACTATATTATGTAG AGTAGG AAAC AC AATC ATGATTTCTGATATG A 
2480 2500 2520 

CTTGACTGATGATGATACTTGTgAAAACTATCTATATATCTCTTCAGTAATCAGTCGCCT 
2540 2560 2580 

TGAGGTAATTGGAATTTGGAACTTGAACATTACTTGGATTTTAACTTTTCAATAGCATAA 
2600 2620 2640 

GCNTTCCTGTTTCATCATATATGTTTCACTATACTTGTATGCTTTTATTACTGCTGATAT 
2660 2680 2700 

TTACTATTCCTGCTATTTTTTTTGGGTCTCGTTAACGGTAATAAGGACACAGAATTGGC^ 
2720 2740 2760 

CTTTTATCCATCAGAACTAGACATTACTGTACAAGTAGATGAAGAATTATGTGGTTCCAT 
2780 2800 2820 

Hindlll 

TACAAATTTAATTTCCAGAAAG£II^AAGCTGCTC 

2840 2860 2880 

Hindlll ~-> G * 

GATCCTG AAGCTT GGAATGATTTGTACTTTTCTTTTGTTTGTGTC 

2900 2920 2940 

AAAAGTGAAAGAAGTGGTGGATCTTTGCTGGAATCTCCAAGTCCTAAGTAGTAGTAGTAG 
2960 2980 3000 

TACATTATATATAATTCTGTTGTTTCTGCAATTGACTTTT^ 

3020 3040 3060 

GTGACGATTCCGGTTTTTACTTTCTTTCTTTTTTTTTTATCA^ 

3080 3100 3120 

ATCAACATCTCGCTCTCATCTAATCGTTAACTATTTTTATTGGGGTAAATGTCTGGATTT 
3140 3160 3180 

GTCTTACCTAAACATGTTTTAAGACTGATGTTTATGC 

3200 3220 3240 
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FIG. 3(VI) 

AATGCTTTATTCAATCCCTATGCAATGGATCTCAACTTAACGGCGCCAACCAGAGAGTTT 
3260 3280 3300 

TACTAACTGTCTTTTGCTTTTAGTTAATATTCCTAATAAATAAAAAGACTGCCAATAATA 
3320 3340 3360 

AAATCGGACCATTTTTATTCTCATAATAAATAAAAGAAGCTCAAGGGAGGTCCCTCCTAC 
3380 3400 3420 

ACTTTTCTGACTCCTTTATGTTCTGTTCTCTGTGATTCATTAACGGATCAGCTATAGCAT 
3440 3460 3480 

TTCCAATTTGTCAGTAAGTTAGGGTTGGTTTGGATTAGCTAATAGCTACCAATGAG 
3500 3520 
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Sequence Range i 1 to 584 



50 



MSPFSATAGD INHREVDPTI WRACACASVQ IPVLHSRVYY FPQGHVEHCC 

100 
* 

PLLSTLPSST SPVPCIITSI QLLADPVTDE VFAHLILQPI TQQOPTPTNY 

150 
* 

SRFGRPDCDV DDNHKVTTFA KILTPSDANN GGGFSVPRFC ADSVPPLLNP 

200 
• 

QIDPPVQKLY VTDIHGAVWD FRHIYRCTPR RHLLTTOWSK FVNSKKLIAC 

250 
* 

DSWFKRKSA DEMYIGVRRT PISSSDGGSS YYGGDEYHOY YSQSSVAKED 



DGSPKKTFRR SGNGKLTAEA VRSINRASQC LPFEWFYPA AGVJSEFWRA 

350 

EDVESSMSMY WTPGTRVKMA METEDSSRIT WFQGIVSSTY QETGPWRGSP 

400 

WKQLQITWDE PEILQNVKRV NPWQVEIAAH ATQLHTPFPP AKRLKYPQPC 

450 
* 

GGFLSGODGE ILYPQSGLSS AAAPDPSPSM F3YSTFPAGM QGARQYDFGS 



FNPTGFIGGM PPQLFTNNFL SPLPDLGKVS TEMMNFGSPP SDHLSPNSNT 

550 

TNLSSGNDLV GNRCPLSKKV NSIQLFGKII TVEEHSESCP AESGLCEEDG 
SKESSDNETQ bSLSHAPPSV PKKSNSNAGS SSQG 



300 



500 
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FIG. 5 

UST OF PRIMERS 



Ac 11 



5' CGTATCGGTTTTCGATTACCGTATT 3' 
located at position 4419-4443 on Ac sequence 



25-mcr 



Ac 14 



5' CGTTTCCGTn'CCGTTTACCGTTTT 3' 
located at position 145-127 on Ac sequence 



25-mcr 



W2 



5' TGCTTGTGCTGGAGCC 3' 19-mcr 
located at position 221-237 on 41-a genomic sequence 
Concentration : 5296 ng / ul 

1.0633 nmoles/ul 



73 



5' GTTATCATCAACATCGCCATCGAATCTGCCG 31-mer 
located at posidon 495-465 on 41-a genomic sequence 
Concentration : 13811 ng/ul 

1.4555 nmoles/ul 



G6 5'- 11 



5' CTGCTGCTGCGTGATCGG 18-mcr 
located at position 438-421 on 41-a sequence 
Concentranon : 4943 ng / ul 

0.8828 nmoles / ul 



COMMENTS 



Lenght of amplification product 



W2/G6 5'-ll 217 bp 
W2/Acll 240 bp 

Acl4/G6 5'-ll 265 bp 



Ac 11 


Ac 14 




W2 


G6 5--11 23 







► 1 +318 



genomic soquonca 
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Wild type Cys Pro L** Leu Ser Thr Leu Alleles 

Sequence TGC CCT CTC CTC TCT ACT CTT 

TGCCCT CTC CTC TC (V Ac 5^ C TCC TCT C TA CTC TT n»41::35SAc 

Female A or B . . , , t , 

TGCCCT CTC CTC AG T CCT CTC TAC TCT T ms 41-1 (+7 bp) 



Male parent 



Revertant H 



Revertant K 



Revertant F 



Revertant C 



Revertant M 



Revertant A 



Revertant L 



TGCCCT CTC CTC TC (V Ac 5'1 C TCC TCT C TA CTC TT ms41::3SSAc 

TGCCCT CTC CTC TC T ACT CTT Ms 41 

TGCCCT CTC CTC TC T ACT CTT MS 41-R 

TGC CCT CTCCT C TCC TCT C TA CTC TT ms 41-2 (+5 bp) 

TGCCCT CTC CTC TC T ACT CTT Ms 41-R 

TGC CCT CTCC AG TCC TCT C TA CTC TT ms 41 -3 (+5 bp) 

TGCCCT CTC CTC TC TACT CTT Ms 41-R 

TGCCCT CTC CTC T G T CCT CTC TAC TCT T ms 41-5 (+7 bp) 

TGCCCT CTC CTC TC T ACT CTT Ms 41-R 

TGC CCT CTC CTCT GA G TC CTC TC T ACT CTT ms 41 -4 (+9bp) 

TGC CCT CK CTC CTC TC T ACT CTT Ms 41-1 R (+3 bp) 

TGC CCT CTC CTC AG T CCT CTC T ACT CTT ms 41*1 (+7 bp) 

TGC CCT OC CTC CTC TC T ACT CTT MS 41-1R (+3 bp) 

TGC CCT CTC CTC TC (3' Ac 5') C TCC TCT C TA CTC TT ms41::3SSAc 

TGCCCT CTC CTC G T CCT CTC TAC TCT T Ms 41-2R (+6 bp) 

TGCCCT CTC CTC TC (3* Ac 5*) CTCC TCT C TA CTC TT m$41::35SAc 

Footprints and alleles induced by 35SAc excision from the ms 41-a locus 

FIG. 6 
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pBIOS176 



= I 
X CD 



« E g 

x m w 




8 

LU 



pMS41-A 



GUS 



Nos 
polyA 



pB!OS177 




CO 



pMS41 - A GUS Nos 

polyA 



pBIOS177-Barnase 




pMS41-A 



Barnase 



Nos 
polyA 



FIG. 7 
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FIG. 8 

Ms41-A cDNA 



p542 




MS4LA 

pBIOS182 




p35S «<-MS41-A Nos 

poly A 



pBIOS1B6 




p35S MS41-A No* 

polyA 



I 

1 kb 
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Clustal Alignment of 41a related sequences 

LLQKVLKQSDVGS 

ZaVPl 515 EKRLTPSDVGK 

S l 3 8 9 7 75 1 IJIKHTYNEEIXQSKRRRNGN^ 

AVKRLARIPHMFCKTLTASDTST 

DGSAEDGVRKGETVKQRFSRMPHMFCKTLTASDTST 



At41a 73 LADE 
OSS2204 1 
Zm41a 1 



TzSvPl LGRIVLPKKEAEVHLP ELKTRDG I S I PMED IGT SRVWNb^YRFWPNNKSRMYLL^ 

OSR1187 I^VIPKQXAERYFXI^DSGX-KXI^^ 

ATTS3975 unuVIPKHHAEKHFPLPSSNVSV-KC^ 

5l748 ^VIPKQKAEKHFPLPSPSPAVTKGVLIN^^ 

At41a GGGFSVPRFCADSVFPLL — NFQIDPFVQKLYVTDIHGAVWDFRHIYR GTPRRHI»LTT I 

OSS2204 HGGFSVPRRAAEDCFPPL- -DYSLQRPFQELVAKDLHGTEWRFRHIYR ! 

Zm41a HGGFSVPRRAAEDCFPPL- -DYSQQRPSQELVAKDLHGTEWRFRHIYR- -GQPRRHLLTT I 



ZmVPl 

ATTS3975 

T21748 

ATTS1074 1 

At41a 

OSS2204 

Zm41a 



ATTS1074 
At41a 



ATTS1074 
At41a 



At41a 
ATTS1074 



At41a 
ATTS1074 



At41a 
ATTS1074 



T-GEFVRSNELQEGD 
GWSRFVKEKNLRAGD 

GWSRFVKEKNLRAGN „^ ommrnrT , osrTM 
GFSGFLRDDESTTTTSKLM 

GWSK^S^IAmSVVB>lRK^Eh« 

GWSGFINKWCLVSGDCSAIPQEVIQffiNFDWGVRRAA-QLKNAISF 
GWSAFVNKKKLVSGD 

* • ****•■ * • *** *' 

bMCRNGNNDGNA- - -AATGRVRVEAVAEAVARAACGQAFEVVYYPRASTPEFCVKAADVR 
VAKEDDGSPKKTFRRSGNGKLTAEAV-RSINRASQGLPFEVVFYPAAGWSEFVVRAEDVE 
# # + ** **. * .♦***.** *. ** *-* ** 

SAMRI RWCSGMRFKMAFE TED S SRI SWEWGTVSAVQVAD P I RWPNS FWRUjQVAWDE PDL 
SSMSMYWTPGTRVKMAMETEDSSRITWFQGIVSST-YQETGPWRGSPWKQLQITWDEPEI 

* * * * * *** * **. * **.**** 

LQNVKRVNPWQVEIAAHATQLH-TPFPPAKRLKYPQP GGGFLSGDDG 

LQNVKRVSPV^VELVSNMPTIHLSPFSPRKKIRIPQPFEFPFHGTKFPIFSPGFANNGGG 

******* ** **. ,* .**.* *... *** • ** ' 

EIL y PQSGLSSAAAPD PSPSMFS - --YSTFPAG- -MQGARQYDFGSF 

ESMCYLSNDNNKAPEGIQGAROAQQLFGSPSPSIXSDL^SSYTGNNKLHSPAMF-LSSF 

* .*..*.. **** . . * * ** 

NPX GFIGGNPPQ LFTNHFLSPLPDLG 

NPRHHHYQARDSENSNNISCSLTMGNPAMVQDKKKSVGSVKTHQFVLFGQPILTEQQVMN 

** . , *** . 



** . * . 



At41a 
ATTS1074 



KVSTEM4NFGSPPSDNLSPNSNT TNLSSGN DLVGNRCPLSKKVNSIQL 

RKRF]^EEAEAEEEKGLVARGLTWNYSLQGI£TGHCKV5MESEDVGRTIJ)LSVIGSYQEL 



At4 la FGKI - - -ITVEEHSE SGPAE SGLCEEDGSKES SD NETQLSLSHAPPSVPK 

ATTS1074 YRKIJUEMFH IEERSDIXTHVVYRDANGVIKRI GDE PFSDFWKATNR1.PI KMD I GGDNVRK 
* ** * .* . * ** * * 



At41a HSNSNAGSSSQG 

ATTS1074 TWITGIRTGENGIDASTKTGPI-SIFA 



FIG. 9 
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Zm41a 5 prime DNA sequence and proposed ORF 

AGGACGGCAGCGCCGAGGACGGCGTACGGAAGGGGGAAACCGTGAAGCAGCGGTTC7CGC 

1 TCCTGCCGTCGCGGCTCCTGCCGCATGCCTTCCCCCTTTGGCACTTCGTCGCCAAGAGCG 
DGSAEDGVRKGETVKQRFSR 

GGATGCCGCACATGTTCTGCAAGACGCTCACGGCCTCCGACACCAGCACGCACGGGGGTT 

CCTACGGCGTGTACAAGACG7TCTGCGAGTGCCGGAGGCTGTGGTCGTGCGTGCCCCCAA 
M PHMFCKTLTASDTSTHGGF 

TCTCCGTGCCGCGCCGCGCCGCCGAGGACTGCTTCCCGCCTCTGGACTACAGCCAGCAGC 

12i ♦ — — * * *" * 

AGAGGCACGGCGCGGCGCGGCGGCTCCTGACGAAGGGCGGAGACCTGATGTCGGTCGTCG 

SVPRRAAEDCFPPLDYSQQR 
GACCGTCGCAGGAGCTTGTGGCCAAGGATTTGCACGGAACCGAGTGGAGGTTCCGCCACA 

- — - - + — — - ~ ~T 

CTGGCAGCGTCCTCGAACACCGGTTCCTAAACGTGCCTTGGCTCACCTCCAAGGCGGTGT 
PSQELVAKDLHGTEWRFRH I 

TTTATCGAG^CAGCCCCGCAGACACCTTTTAACCACTGGATGGAGTGCCTTTGTCAACA 

241 4. -t ♦ «■ * """ " 

AAATAGCTCCCGTCGGGGCGTCTGTGGAAAATTGGTGACCTACCTCACGGAAACAGTTGT 

YRGQPRRHLLTTGWSAFVNK 
AGAAGAAGCTTGTCTCAGGGGAC 



301 



323 



TCTTCTTCGAACAGAGTCCCCTG 
K K L V S G D 
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FIG. 12(1) » 

aagctttagt gactagtgag agtgatttgt tgtgttcttt tgagctcttg cgcttggatt gctttcttct 

140 
* 

ttctcattct ttcttgagat caatactcac ttgtaaccga ggcaagagac accaattgtg tggtggtcct 

210 
* 

tgcgggtaag ttttgttccc ggttgatttg agaagagaaa gctcactcgg tccgagggac cgtttgaaag 

280 
* 

agggaagggg ttgaaaaaga cccggccttt gtggcctcct caatggggag taggtttgcg aaaaccgaac 

350 

ctcggtaaaa caaatccgcg tgtcacactt cttatctgct tgcgatttgt ttttcaccct ctctcgcgga 

420 

ctcgattata tttctaacgc taacccgact tgtagttgtg attaactttg taaatttcag tttcgcccta 

490 
* 

ttcacccccc tctatgcgac tttcagtagt tcatctatcc catgttttac ccctatttgc ttggatctga 

560 
* 

gctgattgcg acttagagac taaactgctg aacttatgaa cctgtgaata aaatactaag taaactagtt 

630 
* 

agtccgaatg tttgtgatag tcatcaagca ccaaaatcaa tataaaaatg gtttaaggcc aatttccttt 

700 
* 

cgcaaagata tggaatgtca taacccgtca atccttcatg taacaatggt cgtgcgttcc ctcaaccata 

770 
* 

caaagggaca tggccgcact gaaaaggcag acacacatag ttttacatat tttctacgct agcacaatag 

640 

cctcgttctc cactctgcaa ctcacgaaaa cagtaacaaa aaettcaaca acatactagg catattttct 

910 
* 

ctccaaactg gtctaaaaac tctcttcaaa ctcacttcga gcaaggtaat cgggacatta gcaccgcaat 

980 

ccctttccta aactccaatc tacttgtcat ggggttgaaa tcacgatagt taatgtgcta ggtaaggggt 

1050 

atggcgcgtt gcatttagct ttcgatggga ttcgatcgtt ttcccatgac gccttcactc tcgaaaccaa 
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FIG. 12(11) 

tgtcacattt tgagatcttg gactttgttt ccaccaaggg attgcgceat gcagctcctc acccgcgtcc 

1190 
* 

ggacggtagc acacgagagg aaccatgaag gcgccttcga catgcgggcc ttggatgggt cgacgaaaaa 

1260 
* 

ggtcctaggt tcgcggccta atgtcgcacg acccgatgct tcgtacaagg tctatagaac ggttagggca 

1330 
■* 

taagggcacc tagttcaaaa aactcaaaag ggcccaaccg agggtagggt tggcagcggg cgacgaagcg 

1400 

aatatagccg cacgtgccac cacacaaaat gagggctaat cttcgatgtc gcaccgtcta ggacccatca 

1470 



tgcatgtagg atccatcttc 



gatgtcaatc acgatccctc atgctcttac gacctcctcg acacgccttc 



1540 
* 



gtgcgattgc tagaggacat tgtcgacgga atatcccctc tttgccctgt gacttggatg attatacatt 

1610 
* 

tatggtaggt gttatggatg ttataaatgg atgtatatga atgtgtgtgt atctatgtgt tgtggatgaa 

1680 

* 

tataaataat tattttctaa ctggtaagaa tcatttctgg tgactaggtt cagtcgataa aaattagtat 

1750 

gtctaatttg tgtattatgt ctatgaaaat tagttaattt tagtttatta atcttcaaaa gttacagacc 

1820 
# 

gacgaaaact agactatcag tcacaactgg taagaaggaa caacgacaac agagatgcca agttactgge 

1890 
* 

ttactgcagc aagctaccgt tttctgcccg cgtgtacatt gaagcacagg tgcgtctaca ctctacgctc 

1960 
# 

tcgagtccaa tataaaaata gactgttggg cacctattgt acccgtaccc ctgttcctgc tcctgccgca 

2030 

gtactgaatt ctgctgctgc tacactcctc tgtccgcatc catccacgtc tctctcctct gccgcccgcc 

2100 

tgcgccaccc atcactgtgc gcgtctcccg catcgtccgc tctctttctt tt-tcaccctt tcceggccca 
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FIG. 12(111) 



2170 



tcttctcttt ttacatctgc aacggcaggc cggctgcggc agcggcagcg gcagctgacc agtgaccgac 

2240 
* 

cacccccaca ccactccggc gccccaatcc tcccccttct tctttttcac tactactact gtactgcacg 

2310 

gtcgccaagc gccagaacgc agtggagaac ggggggcagg actccaacaa gcgttgattt ctgccggcac 

2380 
* 

gcacggcacg ggcacgggca cgggcacggg cgtcccccct cactcacgca ccctgcgtct tttccggctg 

2450 
* 

ccgctgctgg ctggctggct ctggctcaca gctacaggct acagtgaccg ccacgcaacc cacactgtct 

2520 
* 

ctgtcctcgt ctccctctcc cctcctagct ctagctggat aggtgggctc tggggaggag gaggagggta 

2590 
* 

gctaggtagt agctgcctat aggcctcggc ccccattcat ggccattacc acgatgtgtc accccaccac 

2660 
* 

accgccctct ccgatgctgc ctccctcatg ataaccctct ccctggtggt tgttctttgc cttgttgccg 

2730 

* 

tgcagcctcc acccccaccc tcctcattaa tcacttgcta gctccctgcc tccctcccgg ctcccgctcc 

2800 

♦ 

cccttctcgt gcttcgcgcc cccgcagcag ccATGGCGGG GATCGACCTC AACGACACCG TGGAGGAGGA 

2870 

CGAGGAGGAG GCGGAGCCCG GCAACGCCTG CTCCCAGCAG AGCCGGACCA GCTCCGCGGC CACGTTCCCG 

2940 
* 

CCGCCGCCGC CGAACCAGCC GAGGCCGAGC GCCGCGGTGT GCCTCGAGCT GTGGCACGCC TGCGCCGGCC 

3010 
* 

CCGTCGCGCC GCTGCCGAGG AAAGGGAGCG TCGTGGTGTA CCTCCCGCAG GGACACATCG AGCACCTCGG 

3080 

CGACGCCGCG GCCGCCGGCG GAGGCGCGCC GCCGCCCGTC GCCCTGCCGC CCCACGTCTT CTGCCGCGTC 

3150 
# 

GTCGACGTCA CTCTCCATgt gcgcgcgccg gttcctactc aatgcgtgcg tgtgtggatt gcccgtgccg 

SUBSTITUTE SHEET (RULE 26) 



WO 97/23618 



PCT/GB96/03191 



20/35 

FIG. 12UV) 

* 

gtgtgcggct tccactgact ctgtccctct tgcgctcgt* gcagGCGGAC GCGTCCACGG ACGAGGTGTA 

3290 
* 

CGCCCAGCTC GCCCTCGTCG CCGAGAACGA Ggtgcgcgca agccacagtg ctccaccggc attggattcg 

3360 
* 

gcttggtttt ctccttgcgt ccacagagac gagatttggg ctgatttggt gtttcttgtg gcgcttgctt 

3430 

▼ 

cgtgcagGAT GTCGCGAGGC GGCTGCGCGG ACGGTCGGAG GACGGCAGCG CCGAGGACGG CGACGAAGGG 

3S00 
* 

GAAACCGTGA AGCAGCGGTT CTCGCGGATG CCGCACATGT TCTGCAAGAC GCTCACGGCC TCCGACACCA 

3570 
* 

GCACGCACGG CGGCTTCTCC GTGCCACGCC GCGCCGCCGA GGACTGCTTC CCGCCTCTGg tacgcttgcg 

3640 
* 

ttggcttgga aagcttccat cttttgggtg cccgggtgct gctctcaagt gcgattctga atcatctgct 

3710 

# 

cttggggcgt gcagGACTAC AGCCAGCAGC GACCGTCGCA GGAGCTTGTG GCCAAGGATT TGCACGGAAC 

3780 
* 

CGAGTGGAGG TTCCGCCACA TTTATCGAGg tacatgaaca aataatgaga tacaagacga gcacatctac 

3850 
* 

ctatttcttt agcaaactta tgtgcttgct cgccctgaat cattcagtgt cagcgaatga tgtcaatggc 

3920 
* 

tgcacttcag ttggtgattg ttagcgtttt tttacaggat ttgcattact tgtttggatt gagcacttgg 

3990 
* 

gaatgcttca tctttgctca cttaagtcca ggatttgaag tcattgttca gtcactcttt tgctatatat 

4060 
* 

gtcaccatta tgtgatcaga actactaatg gttatatgtt gagagagata tacaaactat gtcaatgttt 

4130 
* 

cctgctgtct gcatttgcaa ccttgtgcgc tatgctcagc atttctcatg tcattggtta gttattgtag 

4200 
* 

tcgtacttaa aatttaccat tttgtccatg aaaaatcatc tgattata tg TTCftSGftgTT CTfyrTCCCSX 
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FIG. 12(V) 

TTTAAGGAAT GTAAAAGAAC alACATGAGA AftrTVTGTCA TGTGTGOTCC ffflOTTTCTQ ATgMTgT^ . 

4340 

frrrTGAATGT GATGCAG GGC AGCCCCGCAG ACACCTTTTA ACCACTGGAT GGAGTGCCTT TGTCAACAAG 

4410 
* 

AAGAAGCTTG TCTCAGGGGA CGCCGTACTA TTTTTGAGgt aggccacagc taacattgga gataattatc 

4480 
* 

acatgttggt gttggccctt tctgaagatt cctcataatt ttcagGGGTG ATAATGGGGA GCTAAGACTT 

4550 

GGAGTGCGCC GTGCAGCTCA GCTTAAAAAT GGATCTGCTT TTCCAGCTCT TTATAACCAG TGCTTAAATC 

4620 
# 

TTGGTTCACT ACCTAATGTT GCACATGCTG TGGCCACCAA AAGTGTGTTC CACATCTACT ACAACCCCAG 

4690 

gtgatgatga atatagcggt ttcactttaa tgcttttgca tgttcaattg ttcatgttgt tggcactctt 

4760 
* 

ttagatgatg tgaactgaaa tgtgctatta actatactct ttcaattgac ggcgatttga aattgtgtca 

4830 
* 

ttttgtgtga tatcatttcc tgagttgttt cgaactatgt aattcatgat tcttactgca attcaacatt 

4900 

aagtgatata taattacttt ttgaattgat attgtcactt acatttggac ccttcaatat aatatagttc 

4970 

cacagctctt tttttagata tcatgacaag tacgcaagta gatctttggt tccttatgta tctcatgtgc 

5040 
* 

atttttacct tcttggacec tgatgtgttg ctgcaagcct taccttttta tccaccaaca atgatggccc 

5110 

tgatggcaat tattgctttc caaaaatctt acagATTAAG CCAATCTGAA TTCATTATAC CATTTTCGAA 

5160 
* 

GTTTATCAAG AGCTTCAGTC AACCATTTTC TGCTGGTTCG AGGTTCAAAG TGAAATATGA GAGTGATGAT 

5250 
* 

GCTTCTGAAA GAAGgttggt gtgctacagt tctcatcttt tacatagatt tatgatggtt gacacatgag 
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FIG. 12(VI) 

agtattatgc agATGCACAG GGATCATAGC AGGAATTGGT GATGCTGACC CCATGTGGCG TGGTTCGAAA 

5390 
* 

TGGAAATGTT TGATGgtatg ttgcctttta agctttaatg attcactttc tgtataactt ttcaggtggt 

5460 



aaatttgtgt tacatatgaa aataatccat gttagataca tgttgaatat aacatgtttc tttatacaga 

5530 
* 

acactaggcg tgtgcatcat gtagctgccg ttgccatcta tttgcactat ttgcttgcta ataaacca.t 

5600 

* 

aagcaatctt gcatatctat ccaataatac aatgcacaac aaatgttgaa aattgcaatt gagagcctac 

5670 
« 

tatgcatccc gtgctccctg agctgtctct gtttgatgta caagtttaat tgtaatgaca catttttttt 

5740 
* 

gcatgtaagt agttctcctt ctccagagca cattctttga tgagcctcat cttagaggca tgttgtatct 

5610 
* 

ttatctaaaa gagactgcct tgtgccagcc tggtttcctt gatcagggct ctaagtaaat aagttcattt 

5880 



cattttggtt tcttattgcc ctgcccctga gtgcacattg taggggtaca taataccctc ttgacttagt 

5950 
* 

aagccagttc taaattgccg caatcttaat cctcttgatg acctUcata ttttgtatat a.aeeaatgg 



6020 



ttcatttttg cagGTTCGAT GGGATGACGA TGTAGATTTT CGTCAACCAA ACAGGATTTC TCCTTGGGAG 

6090 
* 

ATTGAGCTGA CTAGTTCAGT TTCAGGATCT CACATGTCTG CACCAAATGC AAAGAGACTG AAACCATGTC 

6160 
* 

TTCCCCATGT TAATCCAGAC TACCTAGTTC CAAgtatgcc ctgttctgcc cagatgttcg cttaatgatt 

6230 
* 

attttgttag cttccgtcat gaataatatt ttcattttga tagATGGAAG CGOTCOTCCT GATTTTGCGG 

6300 

* 

AATCTGCCCA ATTCCACAAG GTCTTGCAAG GTCAAGAATT ACTGGGTTAT AGAACTCATG ACAATGCTGC 
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FIG. 12(VII) 

TGTTGCAACT TCTCAGCCAT GCGAAGCAAC GAACATGCAG TACATTGATG AACGAAGTTG CTCCAACGAT 

6440 
* 

GCGAGTAACA TTATCCCGGG GGTTCCAAGA ATTGGTGTCA GAACACCACT CGGAAGCCCT AGGTTTTCCT 

6510 

• 

ACCGTTGCTC AGGCTTTGGG GAGTCTCCAA GATTCCAAAA GGTCTTGCAA GGTCAAGAAG TATTTCATCC 

6580 

* 

CTACAGAGGA ACTCTGGTCG ATGCAAGCTT GAGTAATAGT GGCTTCCATC AGCAAGATGG TTCTCATGTG 

6650 

* 

CCTACTCAGG CCAGCAAGTG GCACGCACAG CTACATGGAT GTGCTTTTCG TGGCCAACAA GCACCAGCTG 

6720 
* 

TTCCATCTCA ATCCTCATCC CCACCATCTG TCCTGATGTT TCAACGAGGT GATCCAAAGA TGTCCCCATT 

6790 



TGAATTTGGG CATTTCCACG TGAATAAGAA AGAGGATAGA CGCGCAATGT TTGTCCATGC TGGAGGCATC 

6860 

GGAGGAACTG AGCAAACGAC GATGCTCCAG GCTCATCATG TTTCTGGAGG AACGGGAAAC AGAGATGTGA 

6930 
* 

CCGTTGAGAA ATCTCATCCC GCTGTTGCCG CTGCTTCAGA CAACAGGGAA GTTAGCAAAA ACAGTTGCAA 

7000 
# 

AATATTTGGC ATATCTTTGA CCGAGAAGGT TCCAGCAATG AAAGAAAAGG GCTGTGGTGA CATCAACACC 

7070 
* 

AACTATCCAT CCCCCTTCCT GTCTTTGAAG CAACAAGTGC CGAAATCGCT GGGCAACAGC TGTGCCACCg 

7140 
* 

tgagtgtcct acaccatgta gcacccttga tgtctttctc gagtgaagta actcttaact attataaaat 

7210 
* 

cctgcacGTT CATGAGCAGA GGCCTGTTGT TGCTAGGGTG ATTGACGTTT CAACAGTGGA TATGATGATC 

7280 
* 

TGATGTATTG GAAAACTGTC CTGGAGgtga agtcatgcta gtaccacctc tgtcttcatg ctagtgacca 

7350 
* 

tgaacagcat caaagcattt taagctgact gttcttaagc acatcgctta ttgttgttgc cttgtgtttt 
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FIG. 12(VIII) 

7420 
# 

tgcagGCTGT GTTGCGTAGT GTGGACAGTG TCGGTTTGAT GGTTCGGTAT CGTGAAGACG GGATTTGATT 

7490 
* 

GAGGATCTGG CCAGATTTGT ATCCTAGTTG TAGCTGTTAG AGCACTTTGT ATOACAACCG TGAGTGCTCC 

7560 

GTGTTATCAG CACTAGTTGC TGCTCACAAC TTGCCTCTAT GTTCATAATC TGTATGCCAT GTCAGACCCA 

7630 
* 

TTTATAGAGG GTTTGTTTGC TTGGCATAGT TCTAGACTTA AAGCATTATT ATGAGAACAA ATTTGCTCTG 

7700 
* 

Caccgtatct ttcttacttt caagttggca acggattaac ggtggaggag atgatctgag aggttagttg 

7770 
* 

tgcgacgtat taatggtgtt acatatatta tgcttaggag cattctgcca gctcatttat catatAcatg 

7810 

tcagcacttg atttgttaag tgtagttagt agccttgcac tttgg 
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FIG. 14 

z31 MT^IDIl^DTVEEDEEEAEPGNACSQQSRTSSAATFPPPPPNQPRPSAAVC 

Zm41-A 

z31 IXLWHACAGPVAPLPRKGSVVVTLPQGHIEHXiGDAAAAGGGAPPPVALPP 

Zm41-A 

z31 HVFCKWUVTIJiADASTDEV^ 

„ . DGSAEDGD 

Zm41-A ***♦♦*#* 

23 1 EGETV^QWSFMPHMFCKTLTASDTSTHGGFSVPRBAAEDCFPPICYSQQ 

Zn^l-A EGETVKQRFSRMPHMPCaSTOTASDTSTHOGFSVPBRAAEDCFPPI^YSQQ 

z31 RpsQELVAKDli!GTETOFRHIYRGQPRRHIJ.TTGWSAFSrt^aVSGDA^ 

Zm4 1 - A wsQELVMOMJlCTEWRHfflira W 

231 LFIAGDNGELI^GVRRAAQI^NGSAFPALYNQCLNIXSSLPNV^^ 

2^4 1 _ A I^l^GDNGEUU^VRWUVQliWGSAFPALYN 

z31 VFHI YYNPRLSQSEFI IPFSKFIKSFSQPFSAGSRFKVKTESDDASERRC 

Zm4 1 - A VFHI YYNPKLSQSEFI IPFSKFIKSFSQPFSVGSRFECVRXESDDASERRC 

z31 TGI IAGIGDADPfrWRGSKWKCIMVPIWDDDVDFRQPMRISPWEIEIiTSSVS 
2m4 1 - A TGI I AG I GDADPMWRGSKHKCLMVRWDDDVDEKQPHRI SPWE IELTSSVS 

23i GSHMSAFNAKRIJCPCLPHVNPDYLVFNGSGRPDF 

2^4 1 - A GS HMS APNAKRLKPCLPHVNPD YLVPNGS GRPD FAE S AQFHKVLQGQELL 

231 GYRTHDNAAVATSQPCEATNMQY IDERSCSNDASNII PGVPRI GVRTPLG 

2m4 1 - A GTOTHDNAAVATSQPCEATNMQYIDERSCSNDASNII POVPRIGVRTPLG 

231 sPRFSYRCSGPGESFRPQKVnJQGQEVOT 

Zm4 1-A S PRFS YRC SGFGE S PRPQKVIjQGQE I FHP YBGTLVDASLSNTGFHQQDGS 

231 HVTTQASKWHAQUIGK^^GQQAPA 

Zm4 1-A HVPTQASKWHAQLilGCAFRGPQAPAVPSQSSSPPEVI^lFQRGDPKMSPFE 



231 FGHFHVNKKEDBRAMFVHAGGI GGTEQTTMWAHHVSGGTGNRDVTVEKS 

Zm41-A FGHFHVNKKEDBRPMEVHAGGI GGTEQTTMWAHHVS GGTGNRDVTVEKS 



231 HPAVAAASDNREVSKNSCKI FGI SLTEKVPAMKEKGCGD INTNYPSPFLS 

2^4 1-A HPAVATASDNREFSKNSCXI FGI SLTEKVPAMKEKGCGDINTNINTNY- - 



23i LKQQVPKSI^SC^T\mEQRPWARVIWSTVI>MI 
Zm4 1 -A PKS3^SCATVKEQRPWGRVIDVSTVT*MI 
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FIG. 15(1) - 

gaattcaagg gagaagatga tttatcagca ggctctatga gcacagctgc aaagtcaaga cataattctt 

140 
* 

gggcctctgc aggtgattct cacccctact ctgacattgc ttgcccttca aaaatattca gtcaagacaa 

210 
* 

aaaagaactt actaatcaaa tgtcattatc agtcaatact ttaagataag tagaatcgat gtcccatacg 

280 
* 

acattctagc cacgcactta aacatgtgcc agatatgttc agatcttgtg attcaacaga cctcgacgcc 

350 



gactttcatg tatatctttt aggttgaagc 

420 



ttttgcttag ttcagtgttg ctatcagaaa gctaaaatta 

420 

ttttcttgcc acctcctctg cattttttac tgcttcagct cctggtgctt ctaatcgagt actatagaaa 

490 
* 

gcatctccct tgataaatcg ttgtgtgcaa atatagggtg cttatataat ccatcattag agtatgaggc 

560 
* 

gtgctttatt ctatgtgctt cccacaaaaa gagtagccta ttataaactt tgtattagag cacatgacgt 

630 

tctaagtttt gaccacattt ctctactatt atattgcagc eataaagatt caatttttat gttgggcacc 

700 
+ 

ataaagatgt ttggcaccat tcttcccaaa catttatcta ctattataat gcatgcttta ttcaattttt 

770 
* 

agtattgtta ggggtgaagt cttagtctca agatagcata ttgttgrtttg cctactccga cgactctgac 

840 
* 

gaggctgctg ccccgcgcca ggagggaggt caagaagcct aaga.gccca agGTGAAGCA ACGATTCTCG 

910 
* 

CGGATGCCGC ACATGTTCTG TAAGACGCTC ACGGCCTCCG ACACCAGCAC ACACGTCGGC TTCTCCGTGT 

980 
* 

CGCGCaagga cagagcaagc — r^T^T r T ft E^*™ T rATCTCTGOT CCTTGmTC TrATOft &Xftl 

1050 
* 

mftTft TrT A^T r,TRATGCAG G GCAGCCCCGC AGACACCTTT TAACCACTGG ATGGAGTGCC TTTGTCAACA 
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FIG. 15(11) - 

AGAAGAAGCT TGTCTCAAGG GACGCCGTAC TATTTTTGAG gtaggccaca actaacattg gagataatta 

1190 
* 

tcacatgttg gtgttggccc tttctgaagg ttcctcataa ttttcagGGG TGATAATGGG GAGCTAAGAC 

1260 

TTGGAGTGCG CCGTGCAGCT CAGCTTAAAA ATGGATCTGC TTTTCCAGCT CTTTATAACC AGTGCTCAAA 

1330 

TCTTGGTTCA CTACCTAATG TTGCACATGC TGTGGCCACC AAAAGTGTGT TCCACATCTA CTACAACCCT 

1400 
* 

AGgtgatgat gaatatagcg gtttcacttt aatgtttttg catgttcaat tgttcatgtg gttggcactc 

1470 
# 

ttttagatga tgtgaattga aatgtgctta ttaactactc tttcaattga cggggaattt gaaattgtgt 

1540 
* 

cattgtgtgt gatatcattt cctgagttgt ttcgagctat gtaattcatg attcttactg caattcaaca 

1610 

ttaagtgata tataattact ttttgaattg atattgtcac ttacatttgg acccttcaat ataaatcttt 

1680 
* 

ccaattaatg ctctttttat ccactctttg ttgtcaagtt tctgcaattt agaagtatgc tttcttttgt 

1750 
-* 

atttaattct ttttaggcca cagattgtta tttettcatg ccataatttc tctgttttat tagtcatagt 

1S20 
* 

aacagaaata tttttcaatt gttgtggcgg ctggccttga ctgctatggc ggtggccgga ctggccagcg 

1890 
* 

atggcggtgg ccggatagca ccgcgagagc aacgtccaga ggctagcagt tcgttggttg ttgagatttg 

1960 
* 

taccaatgat tatctatatt tagagttgtt gttggataca cccatccatt tagtccttgt ctatctttta 

2030 
* 

cacaaccatc taaactataa atttagctag gattataaat aagctgttgg agttgctctt aggtggctcc 

2100 
* 

tccaatatag gattagtcca tttttctaca aactttgatg tgaattgagt ttctgccaat catepttatat 
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2170 

atgcatatgt gatgtgaatt gagattcatt gagcaacaca aggattctgt gttggagatg gggtcttaat 

2240 
* 

atttctatca tgtaatatct tttggtagct tgcatcatat taataaaata tctttggtgg cctcaggtct 

2310 
* 

ggtggtaatg cttatgtgat tggtgattct gcaaagcctg agcagaagtg gcacgcctac tatgccacta 

2380 
* 

ctgagcaccc ctgaggagct tgttgttact cttaacatgt gcatgactgg gctggacaag aagagagctt 

2450 
+ 

ctgtcttctt ctaggcttct gctgatggtt acacatcttg tgctaaggag atgaccaagc tctcaggtat 

2520 

ctcggacatt atcctataga cagagatctg cgactaattt gttaggttgg ttcttcatca ttttgtagat 

2580 
* 

gcccttcctt ctcgctacat gaactaacta atgacagagg gtggaagtga cccatgaagc tt 

FIG. 15(111) 
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70 
* 



FIG. 16(1) 

gtcgacctgc aggtcaacgg atctattgaa ccagcagtct ttgcaattga gatttgactg ccggatttgg 

140 

tttcagcatg gatgcaccac cccacatcat gtggttctag agcatatagt ggtcttgtag cgcctaaaag 

210 
* 

ttttagtagc atcaaatgtc agaaatatat cttcatctcc agaaaatatt agtacttcat aggatgaaaa 

280 

ttgttcaacc tgaaataatt tatttcttgc atccttcagg ttgtatgcga aaccactaga ttgaataatt 

350 
* 

caagaaatct acagaggcag tcgtgaacaa ctatatatgc gcaagattga gcctaaggtt tgtagaccct 

420 
* 

ttaattcata caagggcatt gccatttccc ccgtaatttc gatgcagctc ctttagccat ataacaatga 

490 

aaaccaacga tcctgcaatc ctgaaagggt gaatttatgg gagaagcgta caactccttt agccaatgat 

560 
* 

tccaatgaag caccagccta caagaataag atagataaat taacagggta taaaaatgat actaatcaca 

630 

tgragtaaaa gaaacttaat ccttccactg catcacgtat atgtgagtgc tccctggttt ttcattacag 

700 
* 

tcttgtgatt tccattttat gctcgatgta ggtataggca tctgatggag gacgttttgt etctactccc 

770 
* 

gcatgtgaag aaggacaacc aggacaaggt cgagtccaag cagagcaagg ggaacacgct gaacaagttg 

840 
* 

cttgagttca ggagctgctt cagctgcctt tcttcgaggt atagatattc tactgtgcct ccacacagct 

910 
* 

ggtggaaatt ttgttatcat agatacgatg gcggctgctt acatgtggga atcttacact gtataagtca 

980 

gtggcgcaaa tcaaatctcc aacttgggtt tggtccacct ttcgtgaaat gaatgttttc tgggctttca 

1050 

ggtattgagt aaggagctcc cattttgctc tggtgccaaa ttctctacta ggcaattgac gtttttactg 
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FIG. 16(11) u» 

cattt gtgac a tctgccttc ccacaattat aattgttcaa tatatgtatg cattagactt atcaatttta 

1190 
* 

ttaacttatt gaattgtatg tgcatgaagt tttttettte atgtattaca ccacatgaca tagttcttta 

1260 

* 

actaatggca gtgtaccttt tttaaccttt agatggctaa attcaaggga gaagatgatt tattagcagg 

1330 

ctctatgagc acagctgcac agtcaagaca taattcttgg gcctctgcag gtgattctca cccctacgct 

1400 
* 

gacattgctt ggccttcaaa aatattcagt caagacaaaa agaacttact aatcaaatgt cattatcagt 

1470 

caatacttta agataagtag aatcgatgtc ccatacgaca ttctagccac gcacttaaac atgtgccaga 

1540 
* 

tatgttcaga tcttgtgatt cagcagacct tgacgccgag cgggcctccg cggaggcagt agccagatct 

1610 
* 

ggccattgag tgccccgacg ccgctgctta ctcatccatc gccgcggtga cctgcteccc ctcgggcata 

1680 
* 

tctgtccatt gacaccaagc atgttctttc ctgaactgtt ctaaaagttc agtttcatgg ttgtttattc 

1750 

ttttgatcag gaaggagaga aagggagaat cagttagaag aaagaagagt ctgaaagctg agtaatttac 

1820 
* 

ctcaacttta ctacccatgt tattaagatc tattgatgat cgtcccactt actcctatga tgcacagact 

1890 
* 

taatggatca tggactgaca tatttatcac gggttttggg ttgtcttcct tcccagtttt gttttaccag 

1960 
* 

tggagacacg aagattggag gacataaggg cgcaacacag gactacagcg agggggaagg ccagatcaag 

2030 
* 

caggagacaa caagaggtgg gttgctgctc attcacaatt tgatatgttt gttttttcgt tgttatagct 

2100 

gaactgcaca tgcagtttga aacatgttgt tactgatgtg tttgtctatt acaggatgtg atagatggtg 
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FIG. 16(111) »» 

atctctgtga gcagtatccc tccctcctag ctgatatgca gaggaagatt gctgatgagc tggacagaag 

2240 
* 

tccgacgcct gcagcactgc ttggtgagga ttgccaagga ggaagactag aacaagcaag agcagcgtta 

2310 
# 

atcagtgaca gagcatgatg ccatccagat gggacaagat aagtaagcag tcttatatag tctgcccact 

23B0 
* 

cgagttttgt atatatttta ggttgaagct tttgcttagt tcagtgttgc tatcggaaag ctaaaattat 

2450 
* 

tttcttgcca cctcctctgc attgttttgc tgcttcagct cctggtgctt ctaatcgagt actatagaaa 

2520 
* 

gcatctctct tgataaatcg ttgtgtgcaa atatagggtg cttatataat ccatcattag agtatgaggc 

2590 
* 

gtgttttatt ctgtgtgctt cccacaaaaa agagtagcct attataaact ttgtattaga gcacatgacg 

2660 
* 

ttctaagttt tgaccacatt tctctactat tataatgcag ccataaagat tcaattttta tgttgggcac 

2730 
* 

cataaagatg tttggcacca ttcttcccaa acatttatct actattataa tgtgtgcttt attcaatttt 

2800 

tagtattgtt aggggtgaag tcttagtctc aagatagcat attgttgttt gcctactccg acgactctga 

2870 
* 

cgaggctgct gccccgcgcc aggagggagg tcaagaagcc taagaagccc aaggtgaaga agcccaagGT 

2940 

GAAGCAACGA TTCTCCTGGA TGCCGCACAT GTTCTGCAAG ACGCTCATGO CCTCCQACAC CAGCATGCAC 

3010 
* 

GTCGGCTTCT CTGTGCTGNQ CCGCTCCGCC GAGGACTGCT TCCCGCCTCT Agtacgcttg cgttggnttg 

3080 
* 

gaaagcttcc atcttttcgg tgcccgggtg ctgctctcaa ggtgtgattc tgaatcatct gctcttgggg 

3150 
# 

cgtgeagGAC TACAGCCAGC AGCGATCGTC GCAGGAGCTT GTGGCCAAGG ATTTGCACGG AACCGAGTGG 
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FIG. 16(IV) «- 

AGGTTCCGCC ACATTTATCG AGgtacatga acaaatactg agatacaagc cgagcacatc tacctatttc 

3290 
♦ 

tttagcaaac ttatgtgctt gctcgccctg aatcattcag tgtcagcgaa tgatgtcaat ggctgcactt 

3360 
* 

cagttgatga ctgttagcgc tttttacagg atttgcatta cttgtttgga ttgagcactt aggaatgctt 

3430 

catctttgct cacttaagtc caggatttga agtcattgtt cagccactct tttgctatat atgtcaccat 

3500 

tatgtgatca gaactaataa tggttatatg tcgagagaga tatacaaact atgtcaatgt ttcctgttgt 

3570 
* 

ctgcatttgc agccttgtgc gctatgctca gcatttctca tgtcattggt tagttattgt agttgtactt 

3640 
* 

aaaaattacc attttgtcca tgaaaaatca tctgattata tgt Tqh^Q TTCTQCTCTC gTTTftMWA 

3710 
* 

ATGTAAAAGA ACAAACATGA fl AAGCTATGT CATfyTGTGGT CCTTGGTTTC TGftTfiftftTAT GTATCTGftAT 

3780 
* 

GTGATGCAG G GCAGCCCCAC AGACACCTTT TAACCACTGG ATGGAGTGCC TTTGTCAACA AGAAGCTTGT 

3850 

CTCAAGGGAC GCCGTACTAT TTTTGAGgta ggccacaact aacattggag ataattatca catgttggtg 

3920 
* 

ttggcccttt ctgaaggttc ctcgtaattt tcagGGGTGA TAATGGGGAG CTAAGACTTG GAGTGCGCCG 

3990 
* 

TGCAGCTCAG CTTAAAAATG GATCTGCTTT TCCAGCTCTT TATAACCAGT GCTCAAATCT TGGTTCACTA 

4060 

CCTAATGTTG CACATGCTGT GGCCACCAAA AGTGTGTTCC ACATCTACTA CAACCCCAGg tgatgatgaa 

4130 
* 

tatagcggtt tcactttaat gcttttgcat gttcaattgt tcatgttgtt ggcactcttt tagatgatgt 

4200 
* 

gaactgaaat gtgcttatta actactcttt caattgacgg ggatttgaaa ttgtgtcatt gtgtgtgata 
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FIG. 16(V) « 

tcatttcctg agttgtttcg agctatgtaa ttcatgattc ttactgcaat tcaacattaa gtgatatata 

4340 



attacttttt 



gaattgatat tgtcacttac atttggaccc ttcaatataa atctttecaa ttattgctct 



4410 
* 

ttttatccac tctttgttgt caagtttctg caatttagaa gtatgctttc ttttgtattt aattcttttt 

4480 
* 

aggccacaaa ttgttatttc ttcatgccat aatttctctg ttttattagt catagtaaca gaaatatttt 

4550 
* 

tcaattgttg tggcggctag ccttgactgc tatggcggtg gccggactgg cctgagatgg cggtggccgg 

4620 
* 

atagcaccgc gagagcaacg tccagaggct agcagttcat tggttgttga gatttgtacc aatgattatc 

4690 
* 

tatatttaga gttgttgttg gatacaccca tccatttagt ccttgtttat cttttacaca gccatctaaa 

4760 

ctctaaattt agctaggatt ataaataagc tgttggatgc tcttaggtgg ctcctccaat ataggattag 

4830 

tccatttttc tacagatggg gtgatagcat gcacattcta gcatacacat gcccttggcc tggtaatgct 

4900 

tggatttttt tctcacgcaa aagaatatac cggttcgttg aattatgtga tgfccattttc tacttttctg 

4970 
* 

ttttttagcc gatcatccga aggctaatga atattaccct gacccaagat tagtagcata tgttgtaccc 

5040 
* 

tatgcaccta tcctatcgtg gtatcactaa tccttctaaa tttgatatca tcttatctga ttcagcttgt 

5110 
# 

tacttgattt aatttggctc cttgttaaca gtacggatgc tgcaaaaaat tccctgagga gaaaggttga 

5180 
* 

aatcttaaaa ttgaagcctc attggtccaa agcttacttc tatttgtggg atgaggtgcg ttattttacc 

5250 
* 

ttttctgcta tgtcctgatt tcaggggaca ccagtgcaga tgcatgtagg gagaaacttg ttgcagttac 

SUBSTITUTE SHEET (RULE 26) 



PCT/GB96/03191 

WO 97/23618 



35/35 



5320 
* 

agaaatggtt tccaatatct actcttgcaa ttgaagatat ggagttactc cttgggttct eetttugtt 

5390 

ttattatgct cgtccagtag acatgctcct gtagtaaact tatattcatg cttgtaattc catttacaat 

5460 

gtgaatattg tgtatagtag ccatgacatg ataatagatt gttagggtca ctcatcaaat attactatgt 

5530 
* 

gccgtcacaa atatgggcac tccactaggg tttagggttt tacctgttgt gcccagttag ggtcctcat 

5600 

* 

caaatattac agagggtatg ttccatttac agttggagta gatacgcatg acgggggcgc acatgagtta 

5670 
* 

ttagtcttgt cgggatctca tgagtctgat tgacgtattt cggatggctc tcgacgtgcg ggtcgacgac 

5740 
* 

ggaacacttg cagcgcccat gttcggatgc agcgacagcc tccttgtgtc ttcgaactcg cgacgagaga 

5810 
* 

gagtggtatt caggactgct tgcttacagg agagaaataa gctaatttct cagaatctta gaagctgatt 

5B70 
* 

ttacaacagg attgcttgct tacagagttg atcaactaaa aaagcgctat ggttcagaat tc 



FIG.16(VI) 
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